{ "cells": [ { "cell_type": "markdown", "id": "22a1f745-7d1f-4018-8426-8d5944d62488", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Mapping to the toolkit\n", "\n", "The MetObs-toolkit uses standard names and formats for your data. To use the toolkit,\n", "your observational data must be converted to the toolkit standards this is referred to as **mapping**.\n", "\n", "To specify how the mapping must be done a **template** is used. This template contains\n", "all the information on how to convert your tabular data to the toolkit standards.\n", "Since the structure of data files differs for different networks, this template is\n", "unique for each data file. A template is saved as a tabular .json file to reuse and share them.\n", "\n", "On this page, you can find information on how to construct a template." ] }, { "cell_type": "markdown", "id": "c64c4db4-f107-4c9f-8f94-2a490396d861", "metadata": {}, "source": [ "# Toolkit Standards\n", "\n", "The toolkit has standard names for observation types and metadata. Here these standards are presented and described." ] }, { "cell_type": "code", "execution_count": 9, "id": "c090deb9-7f8c-4caa-9b75-83e877bbc042", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-cell", "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "The standard observations present in the Metobs toolkit\n", " ----------------------------------------------------- \n", "\n", "temp | 2mT passive | Celsius \n", "humidity | 2m relative humidity passive | % \n", "radiation_temp | 2m - Black globe | Celsius \n", "pressure | atmospheric pressure (at station) | pa \n", "pressure_at_sea_level | atmospheric pressure (at sea level) | pa \n", "precip | precipitation intensity | mm/m² \n", "precip_sum | Cummulated precipitation | mm/m² \n", "wind_speed | Average 2m 10-min windspeed | m/s \n", "wind_gust | wind gust | m/s \n", "wind_direction | Average 2m 10-min windspeed | ° from north (CW) \n" ] } ], "source": [ "#This codeblock is for illustration, it has no practical use.\n", "from metobs_toolkit.miscellaneous import _tlk_print_standard_obstypes\n", "_tlk_print_standard_obstypes()" ] }, { "cell_type": "markdown", "id": "a2853977-d0fe-480a-8415-1d4a0b02e4a4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-input" ] }, "source": [ "## Data Structures\n", "\n", "To make a template you must be aware of which format your data is in. The toolkit can handle the following data structures:" ] }, { "cell_type": "markdown", "id": "a28b0898-075d-4d54-924f-0bfa914a7261", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Long-format\n", "Observations are stacked in rows per station. One column represents the station names." ] }, { "cell_type": "markdown", "id": "29a3e2d1-f64e-43a4-bf7e-7fc57c071902", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "| Timestamp | 2m Temperature | 2m Humidity | ID |\n", "| -------- | ------- | ------- | ------- |\n", "| 2022-06-07 13:20:00 | 16.4 | 77.3 | Station_A |\n", "| 2022-06-07 13:30:00 | 16.7 | 75.6 | Station_A |\n", "| 2022-06-07 13:20:00 | 18.3 | 68.9 | Station_B |\n", "| 2022-06-07 13:30:00 | 18.6 | 71.9 | Station_B |\n" ] }, { "cell_type": "markdown", "id": "f82027ef-a6b4-4346-b7a8-4a6dbcf5db94", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Single-station-format\n", "The same as a long format but without a column indicating the station names. Be aware that the toolkit interprets it as observations coming from one station." ] }, { "cell_type": "markdown", "id": "e9af576f-e359-4319-b965-259335dcd682", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "| Timestamp | 2m Temperature | 2m Humidity |\n", "| -------- | ------- | ------- |\n", "| 2022-06-07 13:20:00 | 16.4 | 77.3 |\n", "| 2022-06-07 13:30:00 | 16.7 | 75.6 |" ] }, { "cell_type": "markdown", "id": "24ee4503-2123-4646-9fc8-3469a9959284", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Wide-format\n", "Columns represent different stations. The data represents one observation type." ] }, { "cell_type": "markdown", "id": "e6f21701-65d3-4435-a5aa-7bacceea3362", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "| Timestamp | Station_A | Station_B |\n", "| -------- | ------- | ------- |\n", "| 2022-06-07 13:20:00 | 16.4 | 18.3 |\n", "| 2022-06-07 13:30:00 | 16.7 | 18.6 |" ] }, { "cell_type": "markdown", "id": "d2964ead-384c-4d79-85c5-2c2370e1c6ad", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Template creation\n", "\n", "Once you have converted your tabular data files to either long-, wide-, or single-station-format, and saved them as a .csv file, a template can be made." ] }, { "cell_type": "markdown", "id": "53c79a6e-1f4f-4563-ada2-e22c422d68c1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "\n", "
\n", "Note: If you want to use a metadata file, make sure it is converted to a Wide-format and saved as a .csv file.\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "d4a30f28-2222-452f-b663-1061505d4432", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The fastest and simplest way to make a template is by using the `metobs_toolkit.build_template_prompt()` function." ] }, { "cell_type": "markdown", "id": "82a782d0-c7f0-4ecb-b438-759e90159515", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "```python \n", "import metobs_toolkit\n", "\n", "#create a template\n", "metobs_toolkit.build_template_prompt()\n", "```" ] }, { "cell_type": "markdown", "id": "4fa480e3-7acb-498f-85bd-1c9666530b88", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "Note: When the prompt asks if you need further help, and you type yes, some more questions are prompted. Once all information is given to the prompt, it will print out a piece of code that you have to run to load your data into the toolkit.\n", "
" ] }, { "cell_type": "markdown", "id": "3c2cefaa-ff8f-4acf-ab7d-ccce3b47451f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "
\n", "Warning: All CSV data files must be in UTF-8 encoding. For most CSV files, this condition is already met. To make sure, in Microsoft Excel (or similar), you can specify to export as `CSV UTF-8`.\n", " If you encounter an error, mentioning a `\"/ueff...\"` tag in a CSV file, it is solved by converting the CSV to UTF-8.\n", "
" ] }, { "cell_type": "markdown", "id": "b6809658-7bfd-4749-ae7f-bd7ec321d5fd", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "This function will prompt questions and build a template that matches your data file (and metadata) file. The *template.json* file will be stored at a location of your choice.\n", "\n", "To use this template, add its file path to the arguments of the `update_settings()` method." ] }, { "cell_type": "code", "execution_count": 10, "id": "24d28b25-204d-4af9-8d3b-4c5920d6eda1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import metobs_toolkit\n", "\n", "your_dataset = metobs_toolkit.Dataset() #initiate an empty dataset\n", "your_dataset.update_settings(\n", " input_data_file= metobs_toolkit.demo_datafile, #Path to your data (csv) file\n", " input_metadata_file=metobs_toolkit.demo_metadatafile, #Path to your metadata (csv) file\n", " template_file=metobs_toolkit.demo_template) #Path to your template (json) file.\n" ] }, { "cell_type": "markdown", "id": "57e8e26f-6985-43da-af59-174dae62e9f4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The template (file) is read when calling the `Dataset.import_data_from_file()` method, and converted to a `metobs_toolkit.Template` which is accesible for each dataset." ] }, { "cell_type": "code", "execution_count": 11, "id": "de4a7655-e7a8-4919-9a64-ec65bcfd4291", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "your_dataset.import_data_from_file() #will read the data, metadata and template.\n", "\n", "your_dataset.template\n" ] }, { "cell_type": "markdown", "id": "83400f16-d154-4be1-8acc-9c7b18021f87", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "An overview of the template can be printed using the `show()` on the `Template` instance:" ] }, { "cell_type": "code", "execution_count": 12, "id": "29e187ca-19d1-43f5-9d08-4a04ea1b3154", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "------ Data obstypes map ---------\n", " * temp <---> Temperatuur \n", " (raw data in Celsius)\n", " (description: 2mT passive)\n", "\n", " * humidity <---> Vochtigheid \n", " (raw data in %)\n", " (description: 2m relative humidity passive)\n", "\n", " * wind_speed <---> Windsnelheid \n", " (raw data in km/h)\n", " (description: Average 2m 10-min windspeed)\n", "\n", " * wind_direction <---> Windrichting \n", " (raw data in ° from north (CW))\n", " (description: Average 2m 10-min windspeed)\n", "\n", "\n", "------ Data extra mapping info ---------\n", " * name column (data) <---> Vlinder\n", "\n", "------ Data timestamp map ---------\n", " * datetimecolumn <---> None \n", " * time_column <---> Tijd (UTC) \n", " * date_column <---> Datum \n", " * fmt <---> %Y-%m-%d %H:%M:%S\n", " * Timezone <---> None\n", "\n", "------ Metadata map ---------\n", " * name <---> Vlinder \n", " * lat <---> lat \n", " * lon <---> lon \n", " * school <---> school \n" ] } ], "source": [ "your_dataset.template.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }