metobs_toolkit.Dataset.combine_all_to_obsspace#
- Dataset.combine_all_to_obsspace(repr_outl_as_nan=False, overwrite_outliers_by_gaps_and_missing=True)[source]#
Make one dataframe with all observations and their labels.
Combine all observations, outliers, missing observations and gaps into one Dataframe. All observation types are combined an a label is added in a serperate column.
When gaps and missing records are updated from outliers one has to choice to represent these records as outliers or gaps. There can not be duplicates in the return dataframe.
By default the observation values of the outliers are saved, one can choice to use these values or NaN’s. following checks!
- Parameters:
repr_outl_as_nan (bool, optional) – If True, Nan’s are use for the values of the outliers. The default is False.
overwrite_outliers_by_gaps_and_missing (Bool, optional) –
- If True, records that are labeld as gap/missing and outlier are
labeled as gaps/missing. This has only effect when the gaps/missing observations are updated from the outliers. The default is True.
- returns:
combdf – A dataframe containing a continious time resolution of records, where each record is labeld.
- rtype:
pandas.DataFrame()
Examples
>>> import metobs_toolkit >>> >>> # Import data into a Dataset >>> dataset = metobs_toolkit.Dataset() >>> dataset.update_settings( ... input_data_file=metobs_toolkit.demo_datafile, ... input_metadata_file=metobs_toolkit.demo_metadatafile, ... template_file=metobs_toolkit.demo_template, ... ) >>> dataset.import_data_from_file() >>> dataset.coarsen_time_resolution(freq='1h') >>> >>> # Apply quality control on the temperature observations >>> dataset.apply_quality_control(obstype='temp') #Using the default QC settings >>> dataset Dataset instance containing: *28 stations *['temp', 'humidity', 'wind_speed', 'wind_direction'] observation types *10080 observation records *1676 records labeled as outliers *0 gaps *3 missing observations *records range: 2022-09-01 00:00:00+00:00 --> 2022-09-15 23:00:00+00:00 (total duration: 14 days 23:00:00) *time zone of the records: UTC *Coordinates are available for all stations. >>> >>> # Combine all records to one dataframe in Observation-resolution >>> overview_df = dataset.combine_all_to_obsspace() >>> overview_df.head(12) value ... toolkit_representation name datetime obstype ... vlinder01 2022-09-01 00:00:00+00:00 humidity 65.000000 ... observation temp 18.800000 ... observation wind_direction 65.000000 ... observation wind_speed 1.555556 ... observation 2022-09-01 01:00:00+00:00 humidity 65.000000 ... observation temp 18.400000 ... observation wind_direction 55.000000 ... observation wind_speed 1.416667 ... observation 2022-09-01 02:00:00+00:00 humidity 68.000000 ... observation temp 17.100000 ... observation wind_direction 45.000000 ... observation wind_speed 1.583333 ... observation [12 rows x 3 columns]