Example of creating a timeseries dataset in xarray¶
Example of creating a simple timeseries in xarray with attributes for S-ENDA
In [ ]:
%pip install xarray netCDF4
Requirement already satisfied: xarray in ./.venv/lib64/python3.11/site-packages (2023.12.0) Requirement already satisfied: netCDF4 in ./.venv/lib64/python3.11/site-packages (1.6.5) Requirement already satisfied: numpy>=1.22 in ./.venv/lib64/python3.11/site-packages (from xarray) (1.26.3) Requirement already satisfied: packaging>=21.3 in ./.venv/lib64/python3.11/site-packages (from xarray) (23.2) Requirement already satisfied: pandas>=1.4 in ./.venv/lib64/python3.11/site-packages (from xarray) (2.1.4) Requirement already satisfied: cftime in ./.venv/lib64/python3.11/site-packages (from netCDF4) (1.6.3) Requirement already satisfied: certifi in ./.venv/lib64/python3.11/site-packages (from netCDF4) (2023.11.17) Requirement already satisfied: python-dateutil>=2.8.2 in ./.venv/lib64/python3.11/site-packages (from pandas>=1.4->xarray) (2.8.2) Requirement already satisfied: pytz>=2020.1 in ./.venv/lib64/python3.11/site-packages (from pandas>=1.4->xarray) (2023.3.post1) Requirement already satisfied: tzdata>=2022.1 in ./.venv/lib64/python3.11/site-packages (from pandas>=1.4->xarray) (2023.4) Requirement already satisfied: six>=1.5 in ./.venv/lib64/python3.11/site-packages (from python-dateutil>=2.8.2->pandas>=1.4->xarray) (1.16.0) [notice] A new release of pip is available: 23.2.1 -> 23.3.2 [notice] To update, run: pip install --upgrade pip Note: you may need to restart the kernel to use updated packages.
In [ ]:
import xarray as xr
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
Create a timeseries dataset¶
This creates the dataset from a dataframe, but it could also be read from a csv file using pd.read_csv
.
In [ ]:
now = datetime.utcnow()
df = pd.DataFrame(
dict(
time=[(now+timedelta(days=d)).replace(microsecond=0) for d in range(0,5)],
temperature=[4, None, 8, 22, -1],
turbidity=[None, 23.8, 2.5, 32.2, 4.1],
)
)
ds = xr.Dataset.from_dataframe(df.set_index(["time"]))
ds
Out[ ]:
<xarray.Dataset> Dimensions: (time: 5) Coordinates: * time (time) datetime64[ns] 2024-01-17T14:04:20 ... 2024-01-21T14:... Data variables: temperature (time) float64 4.0 nan 8.0 22.0 -1.0 turbidity (time) float64 nan 23.8 2.5 32.2 4.1
Update coordinates with location and metadata¶
A dataset have support for metadata on each variable.
In [ ]:
lat, lon = 60.3833, 5.3443
ds = ds.assign_coords(
dict(
# we don't need location dimension when it is just one point
longitude=xr.Variable((), lon, dict(standard_name="longitude", long_name="Longitude", units="degree_east", axis="X")),
latitude=xr.Variable((), lat, dict(standard_name="latitude", long_name="Latitude", units="degree_north", axis="Y")),
time=xr.Variable("time", ds.time, dict(standard_name="time", long_name="Time of measurement", axis="T")),
)
)
ds
Out[ ]:
<xarray.Dataset> Dimensions: (time: 5) Coordinates: longitude float64 5.344 latitude float64 60.38 * time (time) datetime64[ns] 2024-01-17T14:04:20 ... 2024-01-21T14:... Data variables: temperature (time) float64 4.0 nan 8.0 22.0 -1.0 turbidity (time) float64 nan 23.8 2.5 32.2 4.1
Add station name¶
In [ ]:
ds["station_name"] = xr.DataArray("store_lungen", dims=(), attrs=dict(cf_role="timeseries_id"))
Add metadata for each data variable¶
In [ ]:
ds.temperature.attrs["standard_name"] = "sea_water_temperature"
ds.temperature.attrs["long_name"] = "Sea Water Temperature"
ds.temperature.attrs["units"] = "degree_Celcius"
ds.temperature.attrs["comment"] = "I lost the thermometer in Store Lundgårdsvann"
ds.turbidity.attrs["standard_name"] = "sea_water_turbidity"
ds.turbidity.attrs["long_name"] = "Sea Water Turbidity"
ds.turbidity.attrs["units"] = "NTU"
Assign global attributes¶
In [ ]:
ds = ds.assign_attrs(
dict(
id="e5d54ede-685d-4951-917b-25157ce67314", # can also be set later
naming_authority="bb.badebussen", # can also be set later
title="Measurements in the middle of Store Lundgårdsvann",
title_no="Målinger midt i Store Lungegårdsvann",
summary="Measurements taken at a fixed point in Store Lungegårdsvann during my daily swim",
summary_no="Målinger tatt på eit fast punkt under min daglige svømmetur i Store Lungegårdsvann",
keywords=",".join(
[
"GCMDSK:EARTH SCIENCE > HUMAN DIMENSIONS > SUSTAINABILITY > SUSTAINABLE DEVELOPMENT",
"GCMDLOC:CONTINENT > EUROPE > NORTHERN EUROPE > SCANDINAVIA > NORWAY",
]
),
keywords_vocabulary=",".join(
[
"GCMDSK:GCMD Science Keywords:https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords",
"GCMDLOC:GCMD Locations:https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/locations",
]
),
iso_topic_category="Not available",
featureType="timeseries",
date_created=datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"),
project="Store Lungen",
time_coverage_start=np.datetime_as_string(ds.time.min().values, unit="s", timezone="UTC"),
time_coverage_end=np.datetime_as_string(ds.time.max().values, unit="s", timezone="UTC"),
geospatial_lat_min=float(ds.latitude.min()),
geospatial_lat_max=float(ds.latitude.max()),
geospatial_lon_min=float(ds.longitude.min()),
geospatial_lon_max=float(ds.longitude.max()),
spatial_representation="point",
creator_type='institution',
creator_institution='Badebussen',
institution='Badebussen',
institution_short_name='BB',
creator_email='badebussen@lungen.bb',
creator_url='https://badebussen.bb',
data_owner='Badebussen',
processing_level='Operational',
Conventions='CF-1.7, ACDD-1.3',
publisher_name='badebussen',
publisher_email='publisher@badebussen.bb',
publisher_url='https://badebussen.bb',
license='http://spdx.org/licenses/CC-BY-4.0(CC-BY-4.0)',
history='Created on jupyterhub',
)
)
ds
Out[ ]:
<xarray.Dataset> Dimensions: (time: 5) Coordinates: longitude float64 5.344 latitude float64 60.38 * time (time) datetime64[ns] 2024-01-17T14:04:20 ... 2024-01-21T14... Data variables: temperature (time) float64 4.0 nan 8.0 22.0 -1.0 turbidity (time) float64 nan 23.8 2.5 32.2 4.1 station_name <U12 'store_lungen' Attributes: (12/33) id: e5d54ede-685d-4951-917b-25157ce67314 naming_authority: bb.badebussen title: Measurements in the middle of Store Lundgårdsvann title_no: Målinger midt i Store Lungegårdsvann summary: Measurements taken at a fixed point in Store Lun... summary_no: Målinger tatt på eit fast punkt under min daglig... ... ... Conventions: CF-1.7, ACDD-1.3 publisher_name: badebussen publisher_email: publisher@badebussen.bb publisher_url: https://badebussen.bb license: http://spdx.org/licenses/CC-BY-4.0(CC-BY-4.0) history: Created on jupyterhub
Store the dataset¶
You can specify encoding as an dictionary, C&F doesn't use fillvalue in coordinates and some programs doesn't like int64
In [ ]:
ds.to_netcdf(
"badebussen.nc",
unlimited_dims=["time"],
encoding=dict(
time={"dtype": "int32", "_FillValue": None, "units": "seconds since 1970-01-01 00:00:00"},
longitude={"_FillValue": None},
latitude={"_FillValue": None},
),
)