metobs_toolkit.Dataset.combine_all_to_obsspace#

Dataset.combine_all_to_obsspace(repr_outl_as_nan=False, overwrite_outliers_by_gaps_and_missing=True)[source]#

Make one dataframe with all observations and their labels.

Combine all observations, outliers, missing observations and gaps into one Dataframe. All observation types are combined an a label is added in a serperate column.

When gaps and missing records are updated from outliers one has to choice to represent these records as outliers or gaps. There can not be duplicates in the return dataframe.

By default the observation values of the outliers are saved, one can choice to use these values or NaN’s. following checks!

Parameters:
  • repr_outl_as_nan (bool, optional) – If True, Nan’s are use for the values of the outliers. The default is False.

  • overwrite_outliers_by_gaps_and_missing (Bool, optional) –

    If True, records that are labeld as gap/missing and outlier are

    labeled as gaps/missing. This has only effect when the gaps/missing observations are updated from the outliers. The default is True.

    returns:

    combdf – A dataframe containing a continious time resolution of records, where each record is labeld.

    rtype:

    pandas.DataFrame()

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()
>>> dataset.coarsen_time_resolution(freq='1h')
>>>
>>> # Apply quality control on the temperature observations
>>> dataset.apply_quality_control(obstype='temp') #Using the default QC settings
>>> dataset
Dataset instance containing:
     *28 stations
     *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types
     *10080 observation records
     *1676 records labeled as outliers
     *0 gaps
     *3 missing observations
     *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration:  14 days 23:00:00)
     *time zone of the records: UTC
     *Coordinates are available for all stations.
>>>
>>> # Combine all records to one dataframe in Observation-resolution
>>> overview_df = dataset.combine_all_to_obsspace()
>>> overview_df.head(12)
                                                        value  ... toolkit_representation
name      datetime                  obstype                    ...
vlinder01 2022-09-01 00:00:00+00:00 humidity        65.000000  ...            observation
                                    temp            18.800000  ...            observation
                                    wind_direction  65.000000  ...            observation
                                    wind_speed       1.555556  ...            observation
          2022-09-01 01:00:00+00:00 humidity        65.000000  ...            observation
                                    temp            18.400000  ...            observation
                                    wind_direction  55.000000  ...            observation
                                    wind_speed       1.416667  ...            observation
          2022-09-01 02:00:00+00:00 humidity        68.000000  ...            observation
                                    temp            17.100000  ...            observation
                                    wind_direction  45.000000  ...            observation
                                    wind_speed       1.583333  ...            observation

[12 rows x 3 columns]