metobs_toolkit.Dataset.apply_buddy_check#

Dataset.apply_buddy_check(obstype='temp', use_constant_altitude=False, haversine_approx=True, metric_epsg='31370')[source]#

Apply the buddy check on the observations.

The buddy check compares an observation against its neighbours (i.e. buddies). The check looks for buddies in a neighbourhood specified by a certain radius. The buddy check flags observations if the (absolute value of the) difference between the observations and the average of the neighbours normalized by the standard deviation in the circle is greater than a predefined threshold.

This check is based on the buddy check from titanlib. Documentation on the titanlib buddy check can be found here.

The observation and outliers attributes will be updated accordingly.

Parameters:
  • obstype (String, optional) – Name of the observationtype you want to apply the checks on. The default is ‘temp’.

  • use_constant_altitude (bool, optional) – Use a constant altitude for all stations. The default is False.

  • haversine_approx (bool, optional) – Use the haversine approximation (earth is a sphere) to calculate distances between stations. The default is True.

  • metric_epsg (str, optional) – EPSG code for the metric CRS to calculate distances in. Only used when haversine approximation is set to False. Thus becoming a better distance approximation but not global applicable The default is ‘31370’ (which is suitable for Belgium).

Return type:

None.

Notes

A schematic step-by-step description of the buddy check:

  1. A distance matrix is constructed for all inter distances between the stations. This is done using the haversine approximation, or by first converting the Coordinate Reference System (CRS) to a metric one, specified by an EPSG code.

  2. A set of all (spatial) buddies per station is created by filtering out all stations that are too far.

  3. The buddies are further filtered based on altitude differences with respect to the reference station.

  4. For each station:

    • Observations of buddies are extracted from all observations.

    • These observations are corrected for altitude differences by assuming a constant lapse rate.

    • For each reference record, the mean, standard deviation (std), and sample size of the corrected buddies’ observations are computed.

    • If the std is lower than the minimum std, it is replaced by the minimum std.

    • Chi values are calculated for all reference records.

    • If the Chi value is larger than the std_threshold, the record is accepted, otherwise it is marked as an outlier.

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()
>>> dataset.coarsen_time_resolution(freq='1h')
>>>
>>> #Update some temperature QC settings
>>> dataset.update_qc_settings(obstype='temp',
...                            buddy_min_std=1.5,
...                            buddy_threshold=3.2)

>>> # Apply buddy check on the temperature observations
>>> dataset.apply_buddy_check(obstype='temp',
...                           use_constant_altitude=True)
>>> dataset
Dataset instance containing:
     *28 stations
     *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types
     *10080 observation records
     *69 records labeled as outliers
     *0 gaps
     *3 missing observations
     *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration:  14 days 23:00:00)
     *time zone of the records: UTC
     *Coordinates are available for all stations.