metobs_toolkit.Dataset.import_data_from_file#

Dataset.import_data_from_file(input_data_file=None, input_metadata_file=None, template_file=None, freq_estimation_method=None, freq_estimation_simplify=None, freq_estimation_simplify_error=None, kwargs_data_read={}, kwargs_metadata_read={})[source]#

Read observations from a csv file.

The paths (data, metadata and template) are stored in the settings if Dataset.update_settings() is called on this object. These paths can be updated by adding them as argument to this method.

The input data (and metadata) are interpreted by using a template (json file).

An estimation of the observational frequency is made per station. This is used to find missing observations and gaps.

The Dataset attributes are set and the following checks are executed:
  • Duplicate check

  • Invalid input check

  • Find missing observations

  • Find gaps

Parameters:
  • input_data_file (string, optional) – Path to the input data file with observations. If None, the input data path in the settings is used.

  • input_metadata_file (string, optional) – Path to the input metadata file. If None, the input metadata path in the settings is used.

  • template_file (string, optional) – Path to the template (json) file to be used on the observations and metadata. If None, the template path in the settings is used.

  • freq_estimation_method ('highest' or 'median', optional) – Select wich method to use for the frequency estimation. If ‘highest’, the highest apearing frequency is used. If ‘median’, the median of the apearing frequencies is used. If None, the method stored in the Dataset.settings.time_settings[‘freq_estimation_method’] is used. The default is None.

  • freq_estimation_simplify (bool, optional) – If True, the likely frequency is converted to round hours, or round minutes. The “freq_estimation_simplify_error’ is used as a constrain. If the constrain is not met, the simplification is not performed. If None, the method stored in the Dataset.settings.time_settings[‘freq_estimation_simplify’] is used. The default is None.

  • freq_estimation_simplify_error (Timedelta or str, optional) – The tolerance string or object representing the maximum translation in time to form a simplified frequency estimation. Ex: ‘5min’ is 5 minutes, ‘1h’, is one hour. If None, the method stored in the Dataset.settings.time_settings[‘freq_estimation_simplify_error’] is used. The default is None.

  • kwargs_data_read (dict, optional) – Keyword arguments collected in a dictionary to pass to the pandas.read_csv() function on the data file. The default is {}.

  • kwargs_metadata_read (dict, optional) – Keyword arguments collected in a dictionary to pass to the pandas.read_csv() function on the metadata file. The default is {}.

Note

In pracktice, the default arguments will be sufficient for most applications.

Note

If options are present in the template, these will have priority over the arguments of this function.

Warning

All CSV data files must be in UTF-8 encoding. For most CSV files, this condition is already met. To make sure, in Microsoft Excel (or similar), you can specify to export as `CSV UTF-8`. If you encounter an error, mentioning a “/ueff…” tag in a CSV file, it is often solved by converting the CSV to UTF-8.

Return type:

None.

Examples

>>> import metobs_toolkit
>>>
>>> # Import data into a Dataset
>>> dataset = metobs_toolkit.Dataset()
>>> dataset.update_settings(
...                         input_data_file=metobs_toolkit.demo_datafile,
...                         input_metadata_file=metobs_toolkit.demo_metadatafile,
...                         template_file=metobs_toolkit.demo_template,
...                         )
>>> dataset.import_data_from_file()