metobs_toolkit.Dataset.update_gaps_and_missing_from_outliers#

Dataset.update_gaps_and_missing_from_outliers(obstype='temp', n_gapsize=None)[source]#

Interpret the outliers as missing observations.

If there is a sequence of these outliers for a station, larger than n_gapsize than this will be interpreted as a gap.

The outliers are not removed.

Parameters:
  • obstype (str, optional) – Use the outliers on this observation type to update the gaps and missing timestamps. The default is ‘temp’.

  • n_gapsize (int, optional) – The minimum number of consecutive missing observations to define as a gap. If None, n_gapsize is taken from the settings defenition of gaps. The default is None.

Return type:

None.

Note

Gaps and missing observations resulting from an outlier on a specific obstype, are assumed to be gaps/missing observation for all obstypes.

Note

Be aware that n_gapsize is used for the current resolution of the Dataset, this is different from the gap check applied on the inported data, if the dataset is coarsend.

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()
>>> dataset.coarsen_time_resolution(freq='1h')
>>>
>>> # Apply quality control on the temperature observations
>>> dataset.apply_quality_control(obstype='temp') #Using the default QC settings
>>> dataset
Dataset instance containing:
      *28 stations
      *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types
      *10080 observation records
      *1676 records labeled as outliers
      *0 gaps
      *3 missing observations
      *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration:  14 days 23:00:00)
      *time zone of the records: UTC
      *Coordinates are available for all stations.

>>> # Interpret the outliers as missing/gaps
>>> dataset.update_gaps_and_missing_from_outliers(obstype='temp')
>>> dataset
Dataset instance containing:
      *28 stations
      *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types
      *10080 observation records
      *0 records labeled as outliers
      *2 gaps
      *1473 missing observations
      *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration:  14 days 23:00:00)
      *time zone of the records: UTC
      *Coordinates are available for all stations.