metobs_toolkit.Dataset.fill_missing_obs_linear#

Dataset.fill_missing_obs_linear(obstype='temp')[source]#

Interpolate missing observations.

Fill in the missing observation rectords using interpolation. The missing_fill_df attribute of the Dataset will be updated.

Parameters:

obstype (string, optional) – Fieldname to visualise. This can be an observation or station attribute. The default is ‘temp’.

Return type:

None.

Notes

A schematic description of the linear fill of missing observations:

  1. Iterate over all missing observations.

  2. The missing observations are converted into a set of missing records (depending on the time resolution of the observations).

  3. Find a leading (the last observations before the missing observation) record and a trailing record (the last observation after the missing observation).

  4. By using the leading and trailing records, interpolation is applied to fill the missing records.

  5. The missing record is updated with the interpolated values (metobs_toolkit.Gap.gapfill_df).

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()
>>> dataset.coarsen_time_resolution(freq='1h')
>>>
>>> # Apply quality control on the temperature observations
>>> dataset.apply_quality_control(obstype='temp') #Using the default QC settings
>>>
>>> # Interpret the outliers as missing/gaps
>>> dataset.update_gaps_and_missing_from_outliers(obstype='temp')
>>> dataset
Dataset instance containing:
      *28 stations
      *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types
      *10080 observation records
      *0 records labeled as outliers
      *2 gaps
      *1473 missing observations
      *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration:  14 days 23:00:00)
      *time zone of the records: UTC
      *Coordinates are available for all stations.
>>>
>>> # Fill the missing observations
>>> dataset.fill_missing_obs_linear(obstype='temp')
>>> dataset.missing_obs.get_info()
-------- Missing observations info --------
(Note: missing observations are defined on the frequency estimation of the native dataset.)
  * 1473 missing observations
  * For 28 stations
  * Missing observations are filled with interpolate for:
    temp:
                                            temp
name      datetime
vlinder01 2022-09-08 08:00:00+00:00  18.630303
          2022-09-07 23:00:00+00:00  17.512121
          2022-09-08 00:00:00+00:00  17.636364
          2022-09-08 02:00:00+00:00  17.884848
          2022-09-08 03:00:00+00:00  18.009091
...