metobs_toolkit.Dataset.apply_quality_control#

Dataset.apply_quality_control(obstype='temp', gross_value=True, persistance=True, repetitions=True, step=True, window_variation=True)[source]#

Apply quality control methods to the dataset.

The default settings are used, and can be changed in the settings_files/qc_settings.py

The checks are performed in a sequence: gross_vallue –> persistance –> …, Outliers by a previous check are ignored in the following checks!

The dataset is updated inline.

Parameters:
  • obstype (String, optional) – Name of the observationtype you want to apply the checks on. The default is ‘temp’.

  • gross_value (Bool, optional) – If True the gross_value check is applied if False not. The default is True.

  • persistance (Bool, optional) – If True the persistance check is applied if False not. The default is True.. The default is True.

  • repetition (Bool, optional) – If True the repetations check is applied if False not. The default is True.

  • step (Bool, optional) – If True the step check is applied if False not. The default is True.

  • window_variation (Bool, optional) – If True the window_variation check is applied if False not. The default is True.

Return type:

None.

Notes

A schematic description of the quality control checks.

Gross value check#

This check looks for outliers based on unrealistic values

  1. Find observations that exceed a minimum and maximum value threshold.

  2. These observations are labeled as outliers.

Persistence check#

Test observations to change over a specific period.

  1. Find the stations that have a maximum assumed observation frequency that does not exceed the minimum number of records for moving window size. The window size is defined by a duration.

  2. Subset to those stations.

  3. For each station, a moving window scan is applied that validates if there is variation in the observations (NaN’s are excluded). The validation is only applied when a sufficient amount of records are found in the window specified by a threshold.

  4. After the scan, all records found in the windows without variation are labeled as outliers.

Repetitions check#

Test if observation changes after a number of records.

  1. For each station, make a group of consecutive records for which the values do not change.

  2. Filter those groups that have more records than the maximum valid repetitions.

  3. All the records in these groups are labeled as outliers

Note

The repetitions check is similar to the persistence check, but not identical. The persistence check uses thresholds that are meteorologically based (i.g. the moving window is defined by a duration), in contrast to the repetitions check whose thresholds are instrumentally based (i.g. the “window” is defined by a number of records.)

Step check#

Test if observations do not produce unphysical spikes in time series.

  1. Iterate over all the stations.

  2. Get the observations of the stations (i.g. drop the previously labeled outliers represented by NaN’s).

  3. Find the observations for which:

    • The increase between two consecutive records is larger than the threshold. This threshold is defined by a maximum increase per second multiplied by the timedelta (in seconds) between the consecutive records.

    • Similar filter for a decrease.

  4. The found observations are labeled as outliers.

Note

In general, for temperatures, the decrease threshold is set less stringent than the increase threshold. This is because a temperature drop is meteorologycally more common than a sudden increase which is often the result of a radiation error.

Window Variation check#

Test if the variation is found in a moving window.

  1. Find the stations that have a maximum assumed observation frequency that does not exceed the minimum number of records for moving window size. The window size is defined by a duration.

  2. Compute the maximum increase and decrease thresholds for a window. This is done by multiplying the maximum increase per second by the window size in seconds.

  3. For each station, a moving window scan is applied that validates if the maximum increase/decrease thresholds are exceeded. This is done by comparison of the minimum and maximum values inside the window. The validation is only applied when a sufficient amount of records are found in the window specified by a threshold.

  4. After the scan, all records found in the window that exceed one of these thresholds are labeled as outliers.

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()
>>> dataset.coarsen_time_resolution(freq='1h')
>>>
>>> #Update some temperature QC settings
>>> dataset.update_qc_settings(obstype='temp',
...                            gross_value_max_value=42.,
...                            persis_time_win_to_check='4h',
...                            buddy_min_std = 1.5)

>>> # Apply quality control on the temperature observations
>>> dataset.apply_quality_control(obstype='temp')
>>> dataset
Dataset instance containing:
     *28 stations
     *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types
     *10080 observation records
     *1676 records labeled as outliers
     *0 gaps
     *3 missing observations
     *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration:  14 days 23:00:00)
     *time zone of the records: UTC
     *Coordinates are available for all stations.