metobs_toolkit.Dataset.fill_gaps_era5#

Dataset.fill_gaps_era5(modeldata, method='debias', obstype='temp', overwrite_fill=False)[source]#

Fill the gaps using a diurnal debiased modeldata approach.

Parameters:
  • modeldata (metobs_toolkit.Modeldata) – The modeldata to use for the gapfill. This model data should the required timeseries to fill all gaps present in the dataset.

  • method ('debias', optional) – Specify which method to use. The default is ‘debias’.

  • obstype (String, optional) – Name of the observationtype you want to apply gap filling on. The modeldata must contain this observation type as well. The default is ‘temp’.

  • overwrite_fill (bool, optional) – If a gap has already filled values, the interpolation of this gap is skipped if overwrite_fill is False. If set to True, the gapfill values and info will be overwitten. The default is False.

Returns:

Gapfilldf – A dataframe containing all gap filled values and the use method.

Return type:

pandas.DataFrame

Notes

A schematic description of the fill_gaps_era5 method:

  1. Modeldata is converted to the timezone of the observations.

  2. Iterate over all gaps.
    • The gap is converted into a set of missing records (depending on the time resolution of the observations).

    • Find a leading and trailing period. These periods are a subset of observations respectively before and after the gap. The size of these subsets is set by a target size (in records) and a minimum size (in records). If the subset of observations is smaller than the corresponding minimum size, the gap cannot be filled.

    • Modeldata, for the corresponding station and observation type, is extracted for the leading and trailing period.

    • By comparing the model data with the observations of the leading and trailing period, and grouping all records to their timestamp (i.g. diurnal categories), biasses are computed.

    • Modeldata for the missing records is extracted.

    • Weights ([0;1]) are computed for each gap record, representing the normalized distance (in time), to the beginning and end of the gap.

    • The modeldata at the missing records is then corrected by a weighted sum of the leading and trailing biases at the corresponding timestamp. In general, this means that the diurnal trend of the observations is restored as well as possible.

  3. The gap is updated with the interpolated values (metobs_toolkit.Gap.gapfill_df)

Note

A scientific publication on the performance of this technique is expected.

Examples

import metobs_toolkit

your_dataset = metobs_toolkit.Dataset()
your_dataset.update_settings(
    input_data_file=metobs_toolkit.demo_datafile, # path to the data file
    input_metadata_file=metobs_toolkit.demo_metadatafile,
    template_file=metobs_toolkit.demo_template,
)
# Specify the gap defenition
your_dataset.update_qc_settings(gapsize_in_records = 20)

#Update the gapsize BEFORE importing the data
your_dataset.import_data_from_file()

#Update the settings (definition of the period to calculate biases for)
your_dataset.update_gap_and_missing_fill_settings(
                                                  gap_debias_prefered_leading_period_hours=24,
                                                  gap_debias_prefered_trailing_period_hours=24,
                                                  gap_debias_minimum_leading_period_hours=6,
                                                  gap_debias_minimum_trailing_period_hours=6,
                                                  )
#(As a demonstration, we will fill the gaps of a single station. The following functions can also be
# directly applied to the dataset.)
your_station = your_dataset.get_station('vlinder05')

#Get ERA5 modeldata at the location of your stations and period.
ERA5_modeldata = your_station.get_modeldata(modelname='ERA5_hourly',
                                            obstype='temp')

#Use the debias method to fill the gaps
gapfill_df = your_station.fill_gaps_era5(modeldata=ERA5_modeldata,
                                          obstype='temp')