Introduction

This document describes general data management practices in line with the FAIR (Findable, Accessible, Interoperable and Reusable) guiding principles for scientific data management, and focuses on the management and sharing of dynamical geodata (i.e., geolocated data about processes in nature). The practices aim at a metadata-driven data management regime. The primary focus of this handbook is on management of dynamic geodata. It provides a synopsis of the general data management practices of S-ENDA partners. We have prepared an overview for organisation specific information that can be updated and merged into the general handbook by the user organisations.

The purpose of the Data Management Handbook (DMH) is threefold:

  1. to provide an overview of the principles for FAIR data management to be employed;
  2. to help personnel identify their roles and responsibilities for good data management; and
  3. to provide personnel with practical guidelines for carrying out good data management.

The intended audience for this DMH is any personnel involved in the process of making data available for the end user. This process can be viewed as a value chain that moves from the producer of the data to the data consumer (i.e., the end user). The handbook provides description and summary of data management principles. Details of the practical implementation of the principles are described along 4 pillars: structuring and documenting data; data services; user portals and documentation; and data governance. Practical guidance for data producers is also provided.

The DMH is a strategic governing document and should be used as part of the quality framework the organisation is using.

The S-ENDA partners:

  • Norwegian Meteorological Institute (Meteorologisk institutt - MET)
  • Norwegian Institute for Air Research (Norsk institutt for luftforskning - NILU)
  • Norwegian Institute for Nature Research (Norsk institutt for naturforskning - NINA)
  • Norwegian Institute for Water Research (Norsk institutt for vannforskning - NIVA)

About S-ENDA

S-ENDA is part of a larger effort within the national geodata strategy (“Alt skjer et sted”), and relates to this strategy through GeoNorge, which is developed and operated by the Norwegian Mapping Authority (“Kartverket”). GeoNorge, in turn, relates to the European INSPIRE Geoportal through the Inspire directive. In particular, S-ENDA is responsible for Action 20 of the Norwegian geodata strategy. The goal of Action 20 is to establish a distributed, virtual data center for use and management of dynamic geodata. S-ENDA’s vision is that everyone, from professional users to the general public, should have easy, secure and stable access to dynamic geodata.

The vision of S-ENDA and the goal of Action 20 are aligned with international guidelines, in particular the FAIR Guiding Principles for scientific data management and stewardship. S-ENDA in a national and international context

Dynamic geodata is weather, environment and climate-related data that changes in space and time and is thus descriptive of processes in nature. Examples are weather observations, weather forecasts, pollution (environmental toxins) in water, air and sea, information on the drift of cod eggs and salmon lice, water flow in rivers, driving conditions on the roads and the distribution of sea ice. Dynamic geodata provides important constraints for many decision-making processes and activities in society.

GeoNorge is the national website for map data and other location information in Norway. Here, users of map data can search and access such information. Dynamic geodata is one such information type. S-ENDA extends GeoNorge by taking responsibility for harmonising the management of dynamic geodata in a consistent manner.

Architecture

FAIR

The FAIR guiding principles for scientific data management illustrate the best practice for sharing open data. They were first set out in the 2016 paper The FAIR Guiding Principles for scientific data management and stewardship. The principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data.

InitialsDescription
FindableThe first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. This includes unique and persistent identifiers, rich metadata and that the (meta)data is registered in a searchable resource.
AccessibleOnce the user finds the required data, they need to know how they can be accessed. This includes using a standarised communications protocol and where necessary authentication and authorisation.
InteroperableThe data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. This includes FAIR vocabularies.
ReusableThe ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. This includes accurate and relevant attributes, data usage license, detailed provenance and meeting domain-relevant community standards.

FAIR in S-ENDA and GeoNorge

All metadata and datasets that the S-ENDA project delivers to Geonorge is assessed through FAIR-indicators. The status of various datasets' fulfillment of the FAIR principles as well as other requirements from the national geographic infrastructure can bee seen in the Mareano status register. This status register is a first version and will be improved over time. Assessment is done for each of the four main groups in FAIR and it is possible to achieve 100% for each of the letters (F-A-I-R). The method is based on the same implementation that is used in national and European portals for open data, but criteria are closely linked to requirements from the national infrastructure for geographic information especially in terms of accessibility (A) and interoperability (I).

The Guide - FAIR for extended use of geographical information is used as a guideline for the development of automated FAIR assessment of data in Geonorge - the national portal for geographical data, www.geonorge.no.

Data governance

Purpose

Data governance is a data management concept that ensures the availability, usability, integrity, and security of organisational data. It also establishes rules and procedures to ensure data security and compliance thoughout the data life cycle. This chapter explains how we organize and direct data management efforts to ensure that:

  • Data guidelines are implemented throughout the organisation;

  • Data management practices are in line with and contribute to the institute’s strategic aims;

  • The data management regime is reviewed and updated accordingly.

S-ENDA partners have a commitment to integrate FAIR data governance practices into geodata frameworks.

Data life cycle management

Data life cycle management is steered by documentation describing how data generated or used in an activity will be handled throughout the lifetime of the activity and after the activity has been completed. This is living documentation that follows the activity and specifies what kind of data will be generated or acquired, how the data will be described, where the data will be stored, whether and how the data can be shared, and how the data will be retired (archived or deleted). The purpose of life cycle management is to safeguard the data, not just during their “active” period but also for future reuse of the data, and to facilitate cost-effective data handling.

This DMH recommends the following concepts of life cycle management to be implemented:

  • An institution specific Data Management Handbook (MET has a common template available here);

  • Extended discovery metadata for data in internal production chains (these are metadata elements that provide the necessary information for life cycle management just described);

  • A Data Management Plan (DMP) document.

The goal is that life cycle management information shall be readily available for every dataset managed by the institute.

Data Management Plan

A Data Management Plan (DMP) is a document that outlines how the data life cycle management will be conducted for datasets utilized and generated in specific projects. Typically, these are externally financed projects for which such documentation is required by funding agencies.

The Research Council (Norges forskningsråd - NFR) has implemented guidelines mandating publicly accesible DMPs from its funded projects. For more details on NFR's guidelines regarding the contents of a DMP, click here for more information regarding NFR's guidelines for the contents of a DMP. We recommend adhering to these guidelines for any data management project.

While larger internal projects covering many datasets might benefit from a dedicated document of this nature, generally writing formal DMPs for the various datasets is not cost-effective. In such cases it is more appropriate to employ metadata to steer data handling processes (metadata-steered data management) and thereby automate as much of those processes as possible.

Examples of DMP services:

ServiceUsed by1
EasyDMPNIVA, NINA
ArgosNIVA

1 The institute has used or uses the service

Example of a Data Management Plan Template

EasyDMP, as an example, is a web tool designed to facilitate and standardize the creation of Data Management Plan (DMP) documents. It can be customized according to specific governance requirements by each institution. It provides a list of questions to be answered by the person responsible for providing the DMP.

An EasyDMP template is divided into sections:

  • Data Summary -> A general introduction about data types, sources, quantity, targets, context.
  • FAIR Data -> Expanding on Findability, Accessibility, Interoperability, Reusability.
  • Allocations of Resources -> Infrastructures, funding, responsible persons.
  • Data Security -> Does the data require specific restrictions or limited access?
  • Ethical Aspects -> Any special consideration about the specific ethical implications related to data?
  • Other -> Any additional information not covered before.

Data Structure and Documentation

Structuring and documenting dynamic geodata shared by multiple providers (S-ENDA partners) is essential to ensure that data is findable, understandable reusable and interoperable. Data need to be structured, encoded, and delivered in agreed formats, documented with rich and standard metadata, and supported by standardized vocabularies. Both discovery metadata and use metadata are exposed in a centralized catalog, harvesting metadata from original sources by standard protocols and automated procedures. An essential prerequisite for structuring and documenting data is the specification of the dataset(s). The dataset is the basic building block of the data management models adopted in the S-ENDA context.

This chapter provides explanations of central elements, such as datasets, metadata, and vocabulary. Each subchapter goes more into detail on the requirements on the FAIR guiding principles and lastly you can read more about the institute spesific data structures. An overall description of FAIR principles is given in the previous chapter FAIR.

Dataset

Datasets form the building block of any data management model. Dataset specification is the first step in structuring data for efficient management. A dataset is a collection of data. Typically, a dataset represents a number of variables in time and space. Datasets may be categorised by:

  • Source, such as observations (in situ, remotely sensed) and numerical model projections and analyses;
  • Processing level, such as "raw data", calibrated data, quality-controlled data, derived parameters, temporally and/or spatially aggregated variabled;
  • Data type, including point data, sections and profiles, lines and polylines, polygons, gridded data, volume data, and time series (of points, grids etc.).

Data having all of the same characteristics in each category, but different independent variable ranges and/or responding to a specific need, are normally considered part of a single dataset. In the context of data preservation a dataset consists of the data records and their associated knowledge (information, tools).

In order to best serve the data through web services, the following principles are useful for guiding the dataset definition:

  • A dataset can be a collection of variables stored in, for example, a relational database or as flat files;
  • A dataset is defined as a number of spatial and/or temporal variables;
  • A dataset should be defined by the information content and not the production method;
  • A good dataset does not mix feature types, i.e., trajectories and gridded data should not be present in the same dataset.

In order to support the FAIR principles, the following guidelines must apply to datasets:

  • Be assigned a globally unique and persistent identifier.
  • Be described with rich metadata.
  • Be registered or indexed in a searchable resource.
  • Be retrievable by their identifier using a standardised communications protocol, which is open, free, universally implementable and allows for authentication where necessary.
  • Be released with a clear and accessible data usage license.
  • Be associated with detailed provenance.
  • Use a vocabulary that follows FAIR principles.

Metadata

Metadata is a broad concept. It provides descriptive, structural, or administrative information about data, facilitating its identification, management, understanding, and use. Metadata is essential for organizing, finding, and understanding the context of data.

In our data management model the term "metadata" is used in several contexts, specifically the five categories: discovery, use, site, configuration, and system metadata.

In order to support the FAIR principles, the following guidelines must apply to metadata:

  • Metadata must be self-explanatory and not need additional information.
  • Metadata must include the data identifier clearly and explicitly.
  • Metadata must document the quality of data.
  • Metadata must be registered or indexed in a searchable resource.
  • Clearly identify the data owner.
  • Include the data identifier clearly and explicitly.
  • Use a vocabulary that follows FAIR principles
  • Be accessible even when the data are no longer available.

Metadata types

TypePurposeDescriptionExamples
Discovery metadataUsed to find relevant dataDiscovery metadata are also called index metadata and are a digital version of the library index card. They describe who did what, where and when, how to access data and potential constraints on the data. They shall also link to further information on the data, such as site metadata.ISO 19115; GCMD/DIF
Use metadataUsed to understand data foundUse metadata describes the actual content of a dataset and how it is encoded. The purpose is to enable the user to understand the data without any further communication. They describe the content of variables using standardised vocabularies, units of variables, encoding of missing values, map projections, etc.Climate and Forecast (CF) Convention; BUFR; GRIB
Site metadataUsed to understand data foundSite metadata are used to describe the context of observational data. They describe the location of an observation, the instrumentation, procedures, etc. To a certain extent they overlap with discovery metadata, but also extend discovery metadata. Site metadata can be used for observation network design. Site metadata can be considered a type of use metadata.WIGOS; OGC O&M
Configuration metadataUsed to tune portal services for datasets intended for data consumers (e.g., WMS)Configuration metadata are used to improve the services offered through a portal to the user community. This can, e.g., be how to best visualise a product.
System metadataUsed to understand the technical structure of the data management system and track changes in itSystem metadata covers, e.g., technical details of the storage system, web services, their purpose and how they interact with other components of the data management system, available and consumed storage, number of users and other KPI elements etc.

Vocabulary

An important part of ensuring the FAIRness of the data is to adopt standardized and controlled vocabularies to describe the data and metadata. Controlled vocabularies are controlled lists of terms used to organize information. They require the use of predefined terms to ensure consistency and accuracy in documenting data for easy dataset discovery. The vocabularies themselves shall also follow the FAIR principles, highlighted by the I2 (Interoperability 2) principle. In practice, this means that vocabularies should ideally be:

  • Findable -> registered (indexed, listed) in a vocabulary service
  • Accessible -> available on the web, downloadable
  • Interoperable -> encoded in a standard representation, such as the Web Ontology Language (OWL) or Simple Knowledge Organization System (SKOS) and domain-specific extensions
  • Reusable -> licensed and maintained, ideally with an open license (Source: FAIR vocabularies)

The formalised vocabularies agreed upon during the project aim to facilitate data search in ADC portal across S-ENDA partner data centres. Ideally, each institute provides FAIR compliant machine-to-machine access point to harvest vocabularies, enabling mapping of internal vocabulary to common reference vocabulary (e.g. REST API or SPARQL endpoints for vocabulary servers).

The table below shows the current domain-specific reference vocabularies together with vocabulary servers for each institute.

InstituteDomain-specific reference vocabulariesVocabulary server
METCF Standard Names, GCMD Science KeywordsMETNO Vocabulary Service
NILUCF Standard NamesACTRIS Vocabulary
NINAGEMET, EnvThesUnder development
NIVACF Standard Names, GCMD Science KeywordsUnder development

MET Norway

Current practice

There are essentially two types of dynamical geodata at MET Norway: gridded data and non-gridded data.

Gridded data

Gridded data include numerical and statistical model output and many satellite-derived datasets. MET Norway employs NetCDF/CF as the primary method for structuring gridded data. For documentation, CF-compliance ensures the availability of adequate use metadata in each file...

WMO membership obliges the institute to provide and handle GRIB files; this is done for several datasets, but GRIB does not provide use or discovery metadata in itself.

Non-gridded data

Non-gridded data include most observational data like observations at weather stations, radiosondes, CTD profiles in the ocean, etc. These diverse data types are maintained in many different solutions, ranging from files (e.g. weather station observations extracted from WMO GTS in WMO BUFR format and subsequently converted to NetCDF/CF for easier usage) or DBMS solutions (e.g., the institute’s weather station climate data storage). Software (frost2nc) for conversion of output from the climate data storage to NetCDF/CF for user consumption is available for some stations...

The WMO standard for observations is BUFR, which is also supported but is not self-describing. The institute also maintains other types of data, like textual representations of the weather forecast. Most of these are currently not in any standard format, a notable exception being newly available warnings in CAP format, which is a WMO/OASIS standard format.

NILU

Current practice

NILU handles several types of data, whereas one of the main ones is point in-situ measurements. More information and publicly available data can be found here: Open data.

Time series data

Typical EBAS time series of point in-situ measurements are submitted from many different data sources around the globe in the EBAS Nasa Ames format. The EBAS database (EABS core consisting of software and relational database) is used to quality control and long term store the data. For data dissemination, harmonised secondary datasets in the NetCDF format are generated and served via a THREDDS Data Server (TDS) using THREDDS catalog protocol, ISO metadata and data access through OPeNDAP and http NetCDF download. Metadata for those data are harvested by Met.no for the NorDataNet project.

Gridded data

A set of gridded deposition data for Norway is generated, stored as NetCDF files and published on a THREDDS Data server in the S-ENDA project. The additional use of the WMS protocol for data access facilitates the visualisation of the 2D dataset.

NIVA

Current practice

Data are usually stored in databases like oracle or for newer dataflows in postgresql. Some of this data is publicly available through aquamonitor or for visualization using superset.

Time Series

Loggers are a large source of time series data at NIVA and stored in a DBMS solution. Some of the data is shared with partners using a REST-API. For public access NIVA can share timeseries using OPeNDAP and the C&F convention.

Trajectories

Sensors placed onboard ships, ferryboxes, are typical trajectories sampled by NIVA. For trajectories, metadata is stored in a NoSQL solution, while data is stored in a postgresql database following an internal schema. A REST-API is used to share data upon request, while an internal approval process is used before publishing data for public access. When published, data can be shared through OPeNDAP and following the C&F convention.

NINA

Current practice

Data is stored in multiple formats and structures such as databases, tabular files, geospatial, gridded, and images.

Species observation

One of the main type of data is the point observation of species, delivered in the darwin core standard. Sampling of genetic material in nature converges into huge databases of biodiversity genetic data.

Time series

Automated positioning of individuals by gps devices results in Spatial animal movement data. Regular surveying of the same variables in same locations results in time series of biodiversity data.

Data management

“Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process. “

– Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship 1. Sci Data 3, 160018 (2016).

Data management is the term used to describe the handling of data in a systematic and cost-effective manner. The data management regime should be continuously evolving, to reflect the evolving nature of data collection.

Data is processed and interpreted to generate knowledge (e.g., about the weather) for end users. The knowledge can be presented as information in the form of actual data, illustrations, text or other forms of communication. In this context, an illustration is a representation of data, whereas data means the numerical values needed to analyse and interpret a natural process (i.e., calibrated or with calibration information; it must be possible to understand the meaning of the numerical value from the available and machine-readable information).

Advanced users typically consume some type of data in order to process and interpret it, and produce new knowledge, e.g., in the form of a new dataset or other information. The datasets can be organised in different levels, such as the WMO WIGOS definition for levels of data. Less advanced users apply information based on data (e.g., an illustration) to make decisions (e.g., clothing adapted to the forecast weather).

Between the data providers and data consumers are the processes that manage and deliver the datasets (Figure 1). A number of human roles may be defined with responsibilities that, together, ensure that these processes are carried out in accordance with the data management requirements of the organisation. The definition and filling of these roles depend heavily on the particular organisation, and each organisation must devise its own best solution.

value_chain

Figure 1. Value chain for data

All the S-ENDA partners work to integrate the FAIR Guiding Principles for scientific data management into their routines, and in this chapter we outline some of the differences in the institute specific data management practices, the S-ENDA architecture, datasets the S-ENDA partners have agreed to deliver and a list relevant personnel at each of the partners.

Data management at MET Norway

At MET Norway, data have always been managed, albeit in a rather narrow technical sense. The main focus has been on storage, primarily for operational data. More recently, the issues of delivering, sharing and reusing data have steadily gained prominence. Still, the legacy of data storage systems built up over many years is an important factor in any uniform data management program.

During the International Polar Year (IPY), MET Norway was the international coordinator for operational data. IPY data management was based on the principle of distributed data management through exchange of standardised descriptions of datasets between contributing data centres.

At the global level, the institute is the primary representative for Norway in the WMO. WMO has in recent years reorganised its approach to documenting and sharing data through two major activities: WIS and WIGOS. Both are metadata-driven activities that follow the same principles as Geonorge and INSPIRE, although there are differences concerning standards required.

Data management at NIVA

NIVA works with complex data of many:

  • Types: biological, ecological, physical, chemical
  • Matrices: water, sediment, biota
  • Sampling methods: sensors/loggers, FerryBox, manual sampling
  • Data structures: time series along transect or in fixed position, classic stations, etc.
  • Environments: rivers, lakes, marine open water, etc.

The complexity of NIVA’s data has been shown to be a challenge when it comes to overall data sharing. Nevertheless, much of our data is available in several data portals such as Vannmiljø, Artsdatabanken, and ICES, and NIVA is actively working on making its data more findable and interoperable. We are a partner in many EU projects and data management responsible for some (e.g. PAPILLONS).

Data management at NILU

NILU is one of Europe’s leading institutions for storage of data on atmospheric composition. At present, NILUs thematic databases collects, organizes and makes available data on behalf of, among others, the UN, the World Meteorological Organization (WMO), several European research infrastructures and the European Space Agency (ESA). The information gathered in these databases is openly available and free of charge to anyone who wants access to atmospheric research data, and can be accessed through several data portals such as EBAS, EVDC and ACTRIS.

More about the storage of such data at NILU can be found here: Open data.

Furthermore, NILU monitors data for the following areas in Norway:

  • Airborne transport, including environmental contaminants
  • Climate
  • Ozone layer and UV

NILU’s environmental monitoring takes place at so-called background stations (NILU's observatories and monitoring stations) that are positioned to be minimally affected by nearby sources of emissions.

Lastly, NILU is the National Air Quality Reference Laboratory of Norway, appointed by the Norwegian Environment Agency. This effort is being undertaken in accordance with the European Union’s air quality directive 2008/50/EC, and the Norwegian Pollution Control Act. Each measurement grid operator reports quality-assured data on a monthly basis to the national database for local air quality. Real-time air quality status and the air quality status relative to pollution limits are available on the national web portal www.luftkvalitet.info and www.luftkvalitet.nilu.no.

Data management at NINA

NINA focuses on Nature and Human-nature interactions, biodiversity, and terrestrial ecology and requires the management of a wide variety of data from different sources, in multiple formats. Based on specific agreements with national and international partners, some workflows for data sharing were implemented, for example, the species observation data delivered to the Norwegian Biodiversity Information Centre Artsdatabanken reaching the Global Biodiversity Information Facility GBIF, or the Geospatial data delivered to the Norwegian Mapping authority data portal Geonorge.

Most of the data workflows traditionally follow a project-based approach. Most often data sharing solutions were based on specific project requirements. Data management in the past was scattered with a lack of standardization (with few exceptions) and data was often invisible or unreachable to external stakeholders.

In time, many different workflows and solutions for storing and organizing data were designed and implemented, with some attempts to centralize the storage of specific data types. Geospatial data was organized in internal file systems with naming conventions, for example, and most of the data stored in centralized databases of growing dimensions.

NINA is currently transitioning to a FAIR data-sharing model, adopting International standards and restructuring the internal policies and workflows for storing, sharing, and delivering data and metadata. Centralized metadata catalogs will allow more distributed data management with optimized solutions for finding, accessing, and processing data. Every new starting project is required to provide a Data Management Plan using the easyDMP tool.

S-ENDA Architecture

FAIR data requires the smart use of various systems and solutions that are in continuous development. Infrastructures, systems, and registers are support functions around the datasets, while fulfillment of the FAIR principles requires that the datasets are well defined and well described. It is therefore important to focus on good definitions of the datasets to ensure availablity, including storage formats and information content. The diagram below defines S-ENDA's architecture.

Architecture
The landscape diagram defines integration points between the partners' data infrastructures, included interfaces and metadata formats.

Data storage formats and metadata specifications in S-ENDA

The tables below describe the datasets the partners in S-ENDA have agreed to deliver. They include storage formats and metadata specifications with the agreed integration points needed to integrate with geonorge.no and/or the Arctic Data Center.

MET

TitleGranularityData storage formatStorage format for usage metadataStorage format for search metadataStorage format for provenance metadataVisualization serviceAccess serviceSearch service
Deposisjonskart for nedfall Data fra Arome-Arctic og METCOOP varslingsmodeller
  • Variables:
  • Time period per data set: Start - Reference forecast time (00), Stop - +66 ;
  • Geographical division/coverage: Grid over Svalbard (AA) and the Nordic region (MEPS) with 2.5km x 2.5km routes.
NetCDF-CFNetCDF-CFXMLXML(?)OGC WMSthredds.met.no

NILU

TitleGranularityData storage formatStorage format for usage metadataStorage format for search metadataStorage format for provenance metadataVisualization serviceAccess serviceSearch service
Deposisjonskart for nedfall av forurensning i Norge
  • Variables: one component per dataset
  • Time period per data set: One year;  
  • Geographical division/coverage: Grid over Norway. 50x50 km, 1x1 km, 0.1x0.1 degree
NetCDF files in the THREDDS directoryCF, ACDD, EBASISO19115 (OAI-PMH service)WMSNetCDF file download, OPeNDAP and WMOAI-PMH at NILU, OGC CSW and human web search interface at adc.met.no
Datasett til Nordatanet (Spesielt CO2 på zeppelin)
  • Variables: Time series of measurement values time resolution varies from an hour to a month. Some variables are multidimensional
  • Time period per data set Variable: 1 year to several decades
  • Geographical division/coverage: One measurement point per data set
NetCDF files in the THREDDS directoryCF, ACDD, EBASISO19115 (OAI-PMH service)NetCDF file download and OPeNDAP thredds.nilu.no
  • OAI-PMH at NILU ebas-oai-pmh.nilu.no/oai/
  • OGC CSW and human web search interface at adc.met.no
  • NIVA

    TitleGranularityData storage formatStorage format for usage metadataStorage format for search metadataStorage format for provenance metadataVisualization serviceAccess serviceSearch service
    Kart for overskridelser av tålegrenser for nedfall av forurensning i Norge
    • Variables: sswc,fab ;
    • Time period per data set: Start: 2017, End: 2021 ;  
    • Geographical division/coverage:
    NetCDFCF-NETCDFACDDthredds.t.niva.no (log in required)adc.met.no
    SIOS sensor buoy in Adventfjorden
    • Variables: temperature, conductivity, salinity, turbidity, fDOM, chlorophyll, oxygen and light backscattering;
    • Time period per data set:
    • graphical division/coverage: point
    NetCDFCF-NETCDFACDD xml/NetCDFthredds.t.niva.noMachine search service: ogc csw through adc.met.no
    MULTISOURCE/DigiVEIVANN
    • Variables:(temperature for inlet), turbidity, water level, conductivity;
    • Time period per data set: 24 hours duration
    • Aggregate dataset: Start - 2022-09-14T11:14:59; End - no time_coverage_end (continuously adding new data)
    • graphical division/coverage: point (two datasets: inlet and outlet at a location in Oslo)
    NetCDFCF-NETCDFACDD xml/NetCDFthredds.t.niva.noMachine search service: ogc csw through adc.met.no, Human search service: adc.met.no

    NINA

    TitleGranularityData storage formatStorage format for usage metadataStorage format for search metadataStorage format for provenance metadataVisualization serviceAccess serviceSearch serviceLanding page
    Norwegian breeding bird monitoring scheme
    • Variables: Number of observed pairs ;
    • Time period per data set: Start: 2006, End: - ;  
    • Geographical division/coverage: Location accumulated to one common point per assessment route
    DarwinCore ArchiveNorwegian INSPIRE xmlipt.nina.no
    InsectDB
    • Variables:
    • Time period per data set: Start: 1821, End: - ; 
    • graphical division/coverage: points
    DarwinCore ArchiveNorwegian INSPIRE xmlipt.nina.noNINA insect database
    Skandobs
    • Variables:
    • Time period per data set: Start: 2001, End: - ; 
    • graphical division/coverage: points
    DarwinCore ArchiveNorwegian INSPIRE xmlOGC CSW

    Contacts and roles

    InstituteResponsible PersonnelRoleDescriptionContact
    NIVADag HjermannEnvironmental Data Science Section LeaderProjects, colaborationsdhj@niva.no
    NIVAIvana HuskovaData SpecialistData management planning, Documentation and architecture, Data workflow improvements, vocabularyivana.huskova@niva.no
    NIVAKim LeirvikTech LeadDatasets, NIVAcloud, Data Management, NetCDF, Thredds server, systems architecturekim.leirvik@niva.no
    NIVAViviane GirardinSystems engineerData Management, S-ENDA's DMHviviane.girardin@niva.no
    NILUKjetil TørsethResearch Director – Atmosphere and Climate ResearchOverall responsibility for EBAS at NILUkt@nilu.no
    NILUEBAS data curator(s)Curation and quality assurance of EBAS dataebas@nilu.no
    NILUEBAS scientific expert(s)Scientific responsible for certain components of EBAS dataebas@nilu.no
    NILUPaul EckhartSenior system developerEBAS database architecture and core software developmentpe@nilu.no
    NILUMarkus FiebigSenior scientistDirector of ACTRIS In Situ Data Centre unit, Vocabularymf@nilu.no
    NILUCathrine Lund MyhreSenior scientistDirector ACTRIS Data Centreclm@nilu.no
    NILUWenche AasSenior scientistResponsible for work related to the Chemical Co-ordinating Centre of EMEP (CCC)wa@nilu.no
    NILUAnn Mari FjæraaSenior scientistResponsible for work related to EVDC projectamf@nilu.no
    NILURichard RudSystem developerResponsible for the ACTRIS data portalror@nilu.no
    NINARoald VangTechnical group leadSteering group, Project overviewroald.vang@nina.no
    NINAMatteo De StefanoData EngineerData management, Documentation, Vocabulariesmatteo.destefano@nina.no
    NINAFrancesco FrassinelliData EngineerData Management, System Architecturefrancesco.frassinelli@nina.no
    NINAFrode Thomassen SingsaasLibrarianVocabulariesfrode.singsaas@nina.no
    METThe directors of MET, headed by the Director General, Roar SkålinSteering GroupResponsible for prioritisation,strategic decisions, and resource commitment from the Departments.roars@met.no
    METMorten Wergeland HansenData ManagerHas the overall responsibility for ensuring that the institute’s data management regime is implemented and followed throughout the organisation. The DM is located in the Research and Development Department,Division for Remote Sensing and Data Management (RSDM).mortenwh@met.no
    METSolfrid Agersten, Værvarslingsdivisjonen
    Nina Elisabeth Larsgård, Observasjons- og klimadivisjonen
    Eivind Støylen, Senter for utvikling av varslingstjenesten
    Jan Ivar Pladsen, IT-divisjonen
    Data Management GroupThe Data Management Group (DMG) provides support to the DM, project managers and line managers for the execution of his/her duties and acts as local points of contact for data management in the organisation.solfrida@met.no
    ninael@met.no
    eivinds@met.no
    janip@met.no
    METGjermund Mamen Haugen, Værvarslingsdivisjonen
    Lara Ferrighi, Forsknings og utviklingsdivisjonen - Fjernmåling og Dataforvaltning (FoU-FD)
    Erik Totlund, IT-divisjonen
    Service OrganisationThe Service Organisation (SO) is responsible for operation,technical development and platform management of the services for long-term data.gjermundmh@met.no
    laraf@met.no
    erikt@met.no
    METRemote Sensing and Data Management departmentThe RSDM maintains the responsibility for alignment and coordination with national and international standardisation efforts and modern data management procedures, with participation of other units as required.

    Practical guides

    This chapter includes how-to’s and other practical guidance for data producers.

    Create a Data Management Plan (DMP)

    A Data Management Plan (DMP) is a document that describes textually how the data life cycle management will be carried out for datasets used and produced in specific projects. Generally, these are externally financed projects for which such documentation is required by funding agencies. However, larger internal projects covering many datasets may also find it beneficial to create a specific document of this type.

    The funding agency of your project will usually provide requirements, guidelines or a template for the DMP. If this is not the case, Science Europe offers a comprehensive Practical guide of research data management.

    For researchers with no or little experience in data management, a simple way of creating a DMP is with EasyDMP

    Using easyDMP

    1. Log in to easyDMP, use Dataporten if your institution supports that, otherwise pick one of the other login methods.

    2. Click on + Create a new plan and pick a template

    3. By using the Summary button from page two and on, you can get an overview of all the questions.

    Publishing the plan

    Currently you can use the export function in easyDMP to download an HTML or PDF version of the DMP and use it further. This might change if "Hosted DMP" gets implemented.

    Introduction

    NetCDF (Network Common Data Form) is a format designed to facilitate handling of array-oriented scientific data. For a more detailed description, please visit Unidata's NetCDF user's guide.

    Submitting data as NetCDF-CF

    Workflow

    1. Define your dataset
    2. Create a NetCDF-CF file
      • Add discovery metadata as global attributes (see global attibutes section)
      • Add variables and use metadata following the CF conventions
    3. Store the NetCDF-CF file in a suitable location, and distribute it via thredds or another dap server
    4. Register your dataset in a searchable catalog
      • Test that your dataset contains the necessary discovery metadata and create an MMD xml file
      • Test the MMD xml file
      • Push the MMD xml file to the discovery metadata catalog

    Creating NetCDF-CF files

    By documenting and formatting your data using NetCDF) following the CF conventions and the Attribute Convention for Data Discovery (ACDD), MMD (metadata) files can be automatically generated from the NetCDF files. The CF conventions is a controlled vocabulary providing a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. The ACDD vocabulary describes attributes recommended for describing a NetCDF dataset to data discovery systems. See, e.g.,netCDF4-python docs, or xarray docsfor documentation about how to create netCDF files.

    The ACDD recommendations should be followed in order to properly document your netCDF-CF files. The tables below summarize the required and recommended ACDD as well as additional attributes required to support interoperability between different information systems( e.g.,GCMD/DIF, the INSPIRE and WMO profiles of ISO19115, etc.).

    Notes

    Keywords describe the content of your dataset following a given vocabulary. You may use any vocabulary to define your keywords, but a link to the keyword definitions should be provided in the keywords_vocabulary attribute. This attribute provides information about the vocabulary defining the keywords used in the keywords attribute. Example:

    Keywords:https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords,GEMET:INSPIRE 
    Themes:http://inspire.ec.europa.eu/theme,NORTHEMES:GeoNorge
    Themes:https://register.geonorge.no/metadata-kodelister/nasjonal-temainndeling";
    

    Note that the GCMDSK GEMET and NORTHEMES vocabularies are required for indexing in S-ENDA and Geonorge.

    The keywords should be provided by the keywords attribute as a comma separated list with a short-name defining the vocabulary used, followed by the actual keyword, i.e., short_name:keyword. Example:

    :keywords = "GCMDSK:Earth Science > Atmosphere > Atmospheric radiation, GEMET:Meteorological geographical features,
    GEMET:Atmospheric conditions, NORTHEMES:Weather and climate";
    

    See https://adc.met.no/node/96 for more information about how to define the ACDD keywords.

    A data license provides information about any restrictions on the use of the dataset. To support a linked data approach, it is strongly recommended to use identifiers and URL’s from https://spdx.org/licenses/ and to use a form similar to () using elements from the SPDX license list. Example:

    :license = "http://spdx.org/licenses/CC-BY-4.0(CC-BY-4.0)" ;

    Default global attribute values for MET Norway

    The below is an example of selected global attributes, as shown from the output of ncdump -h <filename>:

    // global attributes:
    :creator_institution = "Norwegian Meteorological Institute" ;
    :institution = "Norwegian Meteorological Institute" ;
    :institution_short_name = "MET Norway" ;:keywords = "GCMDSK:<to-be-added-by-data-creator>, GEMET:<to-be-added-by-data-creator>,
    GEMET:<to-be-added-by-data-creator>, GEMET:<to-be-added-by-data-creator>, NORTHEMES:<to-be-
    added-by-data-creator>" ;
    :keywords_vocabulary = "GCMDSK:GCMD Science
    Keywords:https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords,
    GEMET:INSPIRE Themes:http://inspire.ec.europa.eu/theme, NORTHEMES:GeoNorge
    Themes:https://register.geonorge.no/metadata-kodelister/nasjonal-temainndeling" ;
    :license = "http://spdx.org/licenses/CC-BY-4.0(CC-BY-4.0)" ;
    :naming_authority = "no.met" ;
    :publisher_name = "Norwegian Meteorological Institute" ;
    :publisher_type = "institution" ;
    :publisher_email = "csw-services@met.no" ;
    :publisher_url = "https://www.met.no/" ;
    
    

    Global attribute values for other S-ENDA partners can be added as showm:

    // global attributes:
        :creator_institution = "<to-be-added>" ;
        :institution = "<to-be-added>" ;
        :institution_short_name = "<to-be-added>" ;
        :keywords = "GCMDSK:<to-be-added-by-data-creator>, GEMET:<to-be-added-by-data-creator>, GEMET:<to-be-added-by-data-creator>, GEMET:<to-be-added-by-data-creator>, NORTHEMES:<to-be-added-by-data-creator>" ;
        :keywords_vocabulary = "GCMDSK:GCMD Science Keywords:https://gcmd.earthdata.nasa.gov/kms/concepts/concept_scheme/sciencekeywords, GEMET:INSPIRE Themes:http://inspire.ec.europa.eu/theme, NORTHEMES:GeoNorge Themes:https://register.geonorge.no/metadata-kodelister/nasjonal-temainndeling" ;
        :license = "http://spdx.org/licenses/CC-BY-4.0(CC-BY-4.0)" ;
        :naming_authority = "<to-be-added>" ;
        :publisher_name = "<publisher-institution-name>" ;
        :publisher_type = "institution" ;
        :publisher_email = "<to-be-added>" ;
        :publisher_url = "<to-be-added>" ;
    

    Example script to update a NetCDF-CF file with correct discovery metadata

    We use a test dataset containing the U component of wind at 10 m height, x_wind_10m, extracted from a MEPS 2.5km file (~3.5G). The file can be downloaded by clicking on the link at access point 2., HTTPServer. Below is an example python script to add the required discovery metadata to the netCDF-CF file. The resulting file can be validated by the script scripts/nc2mmd.py in py-mmd-tools, which also can parse the netCDF-CF file into an MMD xml file.The script becomes available as the endpoint nc2mmd in the environment where the py-mmd-tools repo is installed. A warning that the CF attribute featureType is missing will be printed, but this is not a concern for this dataset, since it is gridded data, and featureType only applies to discrete sampling geometries (see https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#_features_and_feature_types).

    #!/usr/bin/env python
    from datetime import datetime, timezone
    import sys
    import netCDF4
    from uuid import uuid4
    
    with netCDF4.Dataset(sys.argv[1], "a") as f:
        f.id = str(uuid4())
        f.naming_authority = "no.met"
    
        f.creator_institution = "Norwegian Meteorological Institute"
        f.delncattr("creator_url")  # not needed, replaced by publisher_url
    
        f.institution = "Norwegian Meteorological Institute"
        f.institution_short_name = 'MET Norway'
    
        f.title = "U component of wind speed at ten meter height (example)"
        f.title_no = "U-komponent av vindhastighet i ti meters høyde (eksempel)"
        f.summary = (
            "Test dataset to demonstrate how to document a netcdf-cf file"
            " containing wind speed at 10 meter height from a MEPS 2.5km "
            "dataset with ACDD attributes.")
        f.summary_no = (
            "Test datasett for å demonstrere hvordan du dokumenterer en "
            "netcdf-cf-fil som inneholder vindhastighet i 10 meters "
            "høyde fra et MEPS 2,5 km datasett med ACDD-attributter.")
        f.references = "https://github.com/metno/NWPdocs/wiki (Users guide)"
    
        f.keywords = (
            "GCMDSK:Earth Science > Atmosphere > Atmospheric winds, "
            "GEMET:Meteorological geographical features, "
            "GEMET:Atmospheric conditions, "
            "GEMET:Oceanographic geographical features, "
            "NORTHEMES:Weather and climate")
        f.keywords_vocabulary = (
            "GCMDSK:GCMD Science Keywords:https://gcmd.earthdata.nasa.gov"
            "/kms/concepts/concept_scheme/sciencekeywords, "
            "GEMET:INSPIRE themes:https://inspire.ec.europa.eu/theme, "
            "NORTHEMES:GeoNorge Themes:https://register.geonorge.no/"
            "subregister/metadata-kodelister/kartverket/nasjonal-"
            "temainndeling")
    
        # Set ISO topic category
        f.iso_topic_category = "climatologyMeteorologyAtmosphere"
    
        # Set the correct license
        f.license = "http://spdx.org/licenses/CC-BY-4.0(CC-BY-4.0)"
    
        f.publisher_name = "Norwegian Meteorological Institute"
        f.publisher_type = "institution"
        f.publisher_email = "csw-services@met.no"
        f.publisher_url = "https://www.met.no/"
    
        # Extract time_coverage_start from the time variable
        tstart = f.variables['time'][0]  # masked array
        tstart = int(tstart[~tstart.mask].data[0])
        tstart = datetime.fromtimestamp(tstart)
        # The time zone is utc (this can be seen from the metadata of the
        # time variable) and the time in isoformat is
        # 2023-02-01T10:00:00+00:00
        f.time_coverage_start = tstart.replace(tzinfo=timezone.utc).isoformat()
    
        # Extract time_coverage_end from the time variable
        tend = f.variables['time'][-1]  # masked array
        tend = int(tend[~tend.mask].data[0])
        tend = datetime.fromtimestamp(tend)
        # The time in isoformat is 2023-02-03T23:00:00+00:00
        f.time_coverage_end = tend.replace(tzinfo=timezone.utc).isoformat()
    
        # The date_created is in this case recorded in the history attribute
        # We have hardcoded it here for simplicity
        f.date_created = datetime.strptime("2023-02-01T11:30:05", "%Y-%m-%dT%H:%M:%S").replace(
            tzinfo=timezone.utc).isoformat()
    
        # Set the spatial representation (a MET ACDD extension)
        f.spatial_representation = 'grid'
    
        # Update the conventions attribute to the correct ones
        f.Conventions = 'CF-1.10, ACDD-1.3'
    

    To test it yourself, you can do the following:

    $ git clone git@github.com:metno/data-management-handbook.git
    $ cd data-management-handbook/example_scripts
    $ wget https://thredds.met.no/thredds/fileServer/metusers/magnusu/test-2023-02-03/meps_reduced.nc
    $ ./update_meps_file.py meps_reduced
    $ cd <path-to-py-mmd-tools>/script
    $ ./nc2mmd.py -i <path-to-data-management-handbook>/example_scripts/meps_reduced.nc -o .
    $ less meps_reduced.xml  # to see the MMD xml file
    

    How to add NetCDF-CF data to thredds

    This section contains institution specific information about how to add netcdf-cf files to thredds.

    MET Norway

    To add your NetCDF-CF data to thredds.met.no? Follow these steps and your data will be discoverable by MET Norway’s thredds server (discovery metadata files can then be created with the nc2mmd tool in py-mmd-tools.

    1. Store the NetCDF-CF file(s) on the lustre filesystem.
    • If you need help to transfer data, file a ticket at hjelp.met.no
    • Make sure that you have sufficient quota on lustre for your data or apply for a quota on hjelp.met.no
    • You can either use your own user space or a group area that you have access to within a project
    • Make sure you add the same data to both the A and B storage rooms if you need redundancy (this requires some extra steps)
    1. Then, thredds needs to be able to discover your data:
    • Use hjelp.met.no to file a ticket for adding data to thredds. The full path to your data or to the base directory of your data structure is needed. All data files ending with .nc ( or.ncml ) below this base directory will be visible on thredds.
    • When thredds configuration updates have been deployed by the thredds administrator, your NetCDF-CF data will become visible on the server.

    The base directory of all data available on thredds is https://thredds.met.no/thredds/catalog.html. Here you will find your data in the folder agreed with the thredds administrator. If your files are correctlyformatted and follow the CF and ACDD conventions, it is also possible to register the datasets in the discovery metadata catalog. See details in [Creating NetCDF-CF files].

    Register and make data available

    In order to make a dataset findable, a dataset must be registered in a searchable catalog with appropriate metadata. Metadata Controlled Production is the process of using metadata and notification services to control downstream production. The (meta)data catalog is indexed and exposed through CSW.

    At MET Norway for example,the MET Messaging System (MMS) a message broker, notifies users about the availability of new data. MMS uses CloudEvents to describe Product Events, which are notification messages about new datasets (or products). MMS can be used both to notifying downstream production systems, and to make datasets findable, accessible and reusable for the general public. The former can be achieved by supplying the minimum required metadata in a Product Event, whereas the latter requires provisioning of an MMD metadata string to the Product Event.

    In order to make a dataset available to the public, a dataset must be registered in a searchable catalog with appropriate metadata.The following needs to be done:

    1. Generate an MMD xml file from your NetCDF-CF file
    2. Test your mmd xml metadata file
    3. Push the MMD xml file to MMS or to the discovery metadata catalog service

    Generation of MMD xml file from NetCDF-CF

    Clone the py-mmd-tools repo and make a local installation with e.g., pip install .. This should bring in all needed dependencies (we recommend to use a virtual environment). Then, generate your mmd xml file as follows:

    cd script
    ./nc2mmd.py -i <your netcdf file> -o <your xml output directory>
    

    See ./nc2mmd.py --help for documentation and extra options.

    You will find Extensible Stylesheet Language Transformations (XSLT) documents in the MMD repository. These can be used to translate the metadata documents from MMD to other vocabularies, such as ISO 19115:

    ./bin/convert_from_mmd -i <your mmd xml file> -f iso -o <your iso output file name>
    

    Note that the discovery metadata catalog ingestion tool will take care of translations from MMD, so you don’t need to worry about that unless you have special interest in it.

    Test the MMD xml file

    Install the dmci app, and run the usage example locally. This will return an error message if anything is wrong with your MMD file. You can also test your MMD file via the DMCI API:

    curl --data-binary "@<PATH_TO_MMD_FILE>" https://dmci.s-enda-*.k8s.met.no/v1/validate
    

    Push the MMD xml file to the discovery metadata catalog

    For development and verification purposes:a producer of data that does not need to go through the event queue should do the following:

    1. Push to the staging environment for tesing and verification: curl --data-binary "@<PATH_TO_MMD_FILE>" https://dmci.s-enda-*.k8s.met.no/v1/insert where * should be either dev or staging.
    2. Check that the MMD file has been added to the catalog services in staging
    3. Push for production (the official catalog): curl --data-binary "@<PATH_TO_MMD_FILE>" https://dmci.s-enda.k8s.met.no/v1/insert
    4. Check that the MMD file has been added to the catalog services in production

    How to find data

    Using OpenSearch

    Local test machines

    The vagrant-s-enda environment provides OpenSearch support through PyCSW. To test OpenSearch via the browser, start the vagrant-s-enda vm (vagrant up) and go to: http://10.10.10.10/pycsw/csw.py?mode=opensearch&service=CSW&version=2.0.2&request=GetCapabilities

    This will return a description document of the catalog service. The URL field in the description document is a template format that can be used to represent a parameterized form of the search. The search client will process the URL template and attempt to replace each instance of a template parameter, generally represented in the form {name}, with a value determined at query time OpenSearch URL template syntax. The question mark following any search parameter means that the parameter is optional.

    Online catalog

    For searching the online metadata catalog, the base url (http://10.10.10.10/) must be replaced by https://csw.s-enda.k8s.met.no/:

    http://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results

    OpenSearch examples

    To find all datasets in the catalog:

    • https://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results

    Or datasets within a given time span:

    • http://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&time=2000-01-01/2020-09-01

    Or datasets within a geographical domain (defined as a box with parameters min_longitude, min_latitude, max_longitude, max_latitude):

    • https://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&bbox=0,40,10,60

    Or, datasets from any of the Sentinel satellites:

    • https://csw.s-enda.k8s.met.no/?mode=opensearch&service=CSW&version=2.0.2&request=GetRecords&elementsetname=full&typenames=csw:Record&resulttype=results&q=sentinel

    Advanced geographical search OGC CSW

    PyCSW opensearch only supports geographical searches querying for a box. For more advanced geographical searches, one must write specific XML files. For example:

    To find all datasets containing a point:

    <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
    <csw:GetRecords
        xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
        xmlns:ogc="http://www.opengis.net/ogc"
        xmlns:gml="http://www.opengis.net/gml"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        service="CSW"
        version="2.0.2"
        resultType="results"
        maxRecords="10"
        outputFormat="application/xml"
        outputSchema="http://www.opengis.net/cat/csw/2.0.2"
        xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
      <csw:Query typeNames="csw:Record">
        <csw:ElementSetName>full</csw:ElementSetName>
        <csw:Constraint version="1.1.0">
          <ogc:Filter>
            <ogc:Contains>
              <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
              <gml:Point>
                <gml:pos srsDimension="2">59.0 4.0</gml:pos>
              </gml:Point>
            </ogc:Contains>
          </ogc:Filter>
        </csw:Constraint>
      </csw:Query>
    </csw:GetRecords>
    
    • To find all datasets intersecting a polygon:
    <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
    <csw:GetRecords
       xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
       xmlns:gml="http://www.opengis.net/gml"
       xmlns:ogc="http://www.opengis.net/ogc"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       service="CSW"
       version="2.0.2"
       resultType="results"
       maxRecords="10"
       outputFormat="application/xml"
       outputSchema="http://www.opengis.net/cat/csw/2.0.2"
       xsi:schemaLocation="http://www.opengis.net/cat/csw/2.0.2 http://schemas.opengis.net/csw/2.0.2/CSW-discovery.xsd" >
     <csw:Query typeNames="csw:Record">
       <csw:ElementSetName>full</csw:ElementSetName>
       <csw:Constraint version="1.1.0">
         <ogc:Filter>
           <ogc:Intersects>
             <ogc:PropertyName>ows:BoundingBox</ogc:PropertyName>
             <gml:Polygon>
               <gml:exterior>
                 <gml:LinearRing>
                   <gml:posList>
                     47.00 -5.00 55.00 -5.00 55.00 20.00 47.00 20.00 47.00 -5.00
                   </gml:posList>
                 </gml:LinearRing>
               </gml:exterior>
             </gml:Polygon>
           </ogc:Intersects>
         </ogc:Filter>
       </csw:Constraint>
     </csw:Query>
    </csw:GetRecords>
    
    • Then, you can query the CSW endpoint with, e.g., python:
    import requests
    requests.post('https://csw.s-enda.k8s.met.no', data=open(my_xml_request).read()).text
    

    Human Search Interface

    Access the human search interface through the provided user portals for S-ENDA partners at [https://s-enda.github.io/DMH/userPortals.html] to find you data via the web browser. For more information about how to use the service [responsible personnel](partners https://s-enda.github.io/DMH/data_management_contacts_tbl.html) for each S-ENDA is provided.

    QGIS

    MET Norway’s S-ENDA CSW catalog service is available at https://data.csw.met.no. This can be used from QGIS as follows:

    1. Select Web > MetaSearch > MetaSearch menu item
    2. Select Services > New
    3. Type, e.g., data.csw.met.no for the name
    4. Type https://data.csw.met.no for the URL

    Under the Search tab, you can then add search parameters, click Search, and get a list of available datasets.

    Access to data without downloading

    It is possible to get access to NetCDF data via URL. Here we provide ways to do so using Python and R.

    With Python

    Xarray installation is required. For more information visit xarray's documentation site or getting started guide.

    If you don't have xarray installed. Do so with

    python -m pip install "xarray[complete]"
            
    # If you use conda: 
    
    conda install -c conda-forge xarray dask netCDF4 bottleneck
    

    Reproducible example:

    import xarray as xr
    
    ds = xr.open_dataset("https://thredds.niva.no/thredds/dodsC/datasets/multisource/msource-inlet.nc")
    ds # this displays basic info on the dataset
    # If you don't need the full dataset subset!
    # That way you don't need to download all the years
    ds = ds.sel(time="2024")
    ds.to_dataframe()
    ds.temperature.plot()
    ds.head()
    

    With R

    There are several packages for NetCDF in R (RNetCDF, ncdf4, raster, stars). The example below uses tidync. Reproducible example:

    library(tidync)
    
    url=paste0('https://thredds.niva.no/thredds/dodsC/datasets/multisource/msource-outlet.nc')
              
    # Read the netcdf file from the url
    dataNiva = tidync(url) 
    

    For full examples see:

    Assignment of Digital Object Identifiers (DOIs)

    A DOI is a unique, persistent and web resolvable identifier (PID), that uniquely and persistently resolves to a digital resource. The DOI is a key element to make digital resources findable on the web, thus allowing for their easy discovery. Additionally, they can be used when citing a specific resource on the web.

    DOIs consist of an alphanumeric string made up by a prefix and a suffix. The prefix is assigned to a particular DOI registrant, e.g., MET Norway, whereas the suffix is provided by the registrant for a specific object. The combination of a prefix and a suffix is unique, and the suffix and the prefix are separated by a forward slash (e.g. 10.21343/z9n1-qw63).

    DOIs in a FAIR perspective

    The use of DOIs for datasets specifically relates with the Findability and Reusability principles:

    • Findability: Making a resource findable is the first step to allow for sharing of data, and for this reason this is a key point for achieving FAIRness of data. The first FAIR guideline principle states the following:
    • F1. (meta)data are assigned a globally unique and persistent identifier. Thus, assigning a DOI to a digital resource adheres with the first principle of FAIRness, as the identifier will be globally unique as well as persistent. Additionally, the third principle for findability:
    • F3: Metadata clearly and explicitly include the identifier of the data they describe. This means that the metadata and the data should be connected explicitly by mentioning a dataset’s globally unique and persistent identifier in the metadata of that specific dataset. This applies more particularly for data formats that do not include metadata in their headers, usually resulting in separate files for data and metadata.
    • Reusability of data is strictly related with the possibility of citing data. When the DOI is incorporated into a citation it becomes a guaranteed location for the resource cited, because the DOI will always resolve to the right web address (URL). Using a DOI then allows for proper citation of data when it is reused, giving the right credits and visibility to scientists.
    Further reading and resources:
    1. DOI FAQ [https://www.doi.org/faq.html]
    2. FAIR principle paper [https://doi.org/10.1038/sdata.2016.18]
    3. GO-FAIR [https://www.go-fair.org/fair-principles]
    4. OPENAIR [https://www.openaire.eu/how-to-make-your-data-fair]
    

    How-to assign DOIs to your datasets

    It is most practical to assign DOIs to parent datasets. However, any dataset can be assigned a DOI,irrespective of the DOI status of its related datasets. For example, a child dataset may receive its own DOI even if its parent also has a DOI. This may be practical if the dataset is, e.g., used in a scientific publication.

    In order for the DOI registration agency (DataCite) to assign a DOI, the followinf have to be provided:

    1. A metadata file containing at least the DataCite mandatory elements
    2. A URL to the dataset landing page, which will be the persistent URL to which the DOI resolves

    For MET Norway, the information required for the DataCite metadata file is available in MMD whereas the DOI landing page is automatically created on the https://adc.met.no/ data portal, upon request from the data producer. Only administrators of the data portal have access to this service and are thus the only ones who can finalize the process of assigning a DOI. Examples of landing pages created using this approach are available at:

    • https://adc.met.no/landing-page-collector
    Further reading and resources:
    1. DOI at MET [https://adc.met.no/sites/adc.met.no/files/articles/DOIs-at-METNO.pdf]
    

    Searching data in the Catalog Service for the Web (CSW) interface

    The Catalog Service for the Web (CSW) interface is a standard for interacting with a catalog service. It is a web service that allows clients to discover, browse, and query metadata about data, services, and other potential resources. For example to do a wildcard search use the follow snippet in R

    library(httr)
    
    headers <- c(
      'Content-Type' = 'application/xml'
    )
    
    body <- "<?xml version=\"1.0\"?>\n
          <csw:GetRecords xmlns:csw=\"http://www.opengis.net/cat/csw/2.0.2\"  
                  xmlns:gmd=\"http://www.isotc211.org/2005/gmd\" 
                  xmlns:ogc=\"http://www.opengis.net/ogc\" 
                  xmlns:gml=\"http://www.opengis.net/gml\" 
                  service=\"CSW\" version=\"2.0.2\"\n
                  resultType=\"results\"  
                  outputSchema=\"http://www.opengis.net/cat/csw/2.0.2\">\n
            <csw:Query typeNames=\"csw:Record\">\n
              <csw:ElementSetName>full</csw:ElementSetName>\n
              <csw:Constraint version=\"1.1.0\">\n
                <ogc:Filter>\n
                  <ogc:PropertyIsLike wildCard=\"*\" 
                            singleChar=\"?\" 
                            escapeChar=\"\\\" 
                            matchCase=\"false\">\n
                    <ogc:PropertyName>csw:AnyText</ogc:PropertyName>\n
                    <ogc:Literal>*snow*</ogc:Literal>\n
                  </ogc:PropertyIsLike>\n
                </ogc:Filter>\n
              </csw:Constraint>\n
            </csw:Query>\n
          </csw:GetRecords>"
    
    res <- VERB("POST", url = "https://www.geonorge.no/geonetwork/srv/nor/csw/csw", body = body, add_headers(headers))
    
    cat(content(res, 'text'))
    

    There are also more examples on here

    Data Services

    This chapter presents an overview over the free services and provided by S-ENDA's partners. Implementation of the services must be in line with the institute’s delivery architecture.

    Data services include:

    • Data ingestion, storing the data in the proper locations for long term preservation and sharing;

    • Data cataloging, extracting the relevant information for proper discovery of the data;

    • Configuration of visualisation and data publication services (e.g., OGC WMS and OpeNDAP).

    Implementation

    When planning and implementing data services, there are a number of external requirements that constrain choices, especially if reuse of solutions nationally and internationally is intended and wanted for the data in question. At the national level, important constraints are imposed by the national implementation of the INSPIRE directive through Norge digitalt.

    User Portals and Documentation

    A portal is an entry point for data consumers, enabling them to discover and search for datasets and services independently.

    On the table below you will find a list of relevant portals where data from S-ENDA partners are available

    User portals relevant to S-ENDA partners

    User portalDescriptionService typeData Producer
    GeonorgePublic map catalogueAccessGeonorge
    Arctic Data CenterThe Arctic Data Centre (ADC) is a service provided by the Norwegian Meteorological Institute (MET) and is a legacy of the International Polar Year (IPY) when MET coordinated operational data streams internationally and research data nationally.Search, VisualizationMET, NILU, NIVA, NINA, Others
    data.met.noMain access point to free and open data from MET NorwaySearch, Access, VisualizationMET
    NIVA's THREDDS Data ServerTimeseries data. Still under development (test server)AccessNIVA
    AquamonitorWater monitoring systemAccess, VisualizationNIVA
    VannmiljøVannmiljø is the environmental authorities' specialist system for recording and analyzing the state of water, and plays a central role in the planning and implementation of all monitoring activities that follow from the water regulations. Data will, however, be available for all cases where information on the state and development of the water environment quality is requested.Search, AccessMiljødirektoratet, NIVA, others
    ICES DOMEData portal used by OSPAR, HELCOM, AMAP, and Expert Groups in the management of chemical and biological data for regional marine assessments.SearchNIVA, others
    NINA Geodata PortalGeodata PortalAccess, VisualizationNINA
    EBAS Data PortalEBAS is a database infrastructure operated by NILU. It's main objective is to handle, store and disseminate atmospheric composition data generated by international and national frameworks such as EMEP, GAW, ACTRISSearch, Access, VisualizationNILU, Others
    EBAS Near-Real-Time DataVisualization of NRT data in EBASVisualizationNILU, Others
    ACTRIS Data PortalData Portal for all data from the Aerosol, Clouds and Trace Gases Research Infrastructure (ACTRIS), where NILU is leading the DVAS and In-Situ Data Centre unitsSearch, Access, VisualizationNILU, Others
    NILU's THREDDS ServerTHREDDS Server to access EBAS data and NRT products from EBAS and ACTRISM2M AccessNILU, Others
    LuftkvalitetsdatabasenData portal with air quality measurements from norwegian measurement stationsSearch, Access, Visualization, MapsNILU, Miljødirektoratet, Norwegian municipalies
    EVDC Data PortalData Portal for ESA Validation Data CenterLong-term repository in Europe for archiving and exchange of correlative data for validation of atmospheric composition products from satellite platforms.Search, Access
    AcronymMeaning
    ACDDAttribute Convention for Dataset Discovery RD5
    ADCArctic Data Centre (ADC)
    AeNArven etter Nansen (English: Nansen Legacy)
    BUFRBinary Universal Form for the Representation of meteorological data. WMO standard format for binary data, particularly for non-gridded data (BUFR)
    CDMUnidata Common Data Model (CDM)
    CFClimate and Forecast Metadata Conventions (CF)
    CMSContent Management System
    CSWCatalog Service for the Web (CSW)
    DAPData access protocol (DAP)
    DBMSDataBase Management System (DBMS)
    DIANADigital Analysis tool for visualisation of geodata, open source from MET Norway (DIANA)
    diana-WMSWMS implementation in DIANA
    DIASCopernicus Data and Information Access Services (DIAS)
    DIFDirectory Interchange Format of GCMD (DIF)
    DLMData life cycle management (DLM)
    DMData Manager
    DMHData Management Handbook (this document)
    DMCGData Management Coordination Group
    DMPData Management Plan (DMP definition, easyDMP tool)
    DOIDigital Object Identifier (DOI)
    EBASDatabase with atmospheric measurement data at NILU
    eduGAINThe Global Academic Interfederation Service (eduGAIN)
    ENVRIEuropean Environmental Research Infrastructures (ENVRI)
    ENVRI FAIRMaking the ENV RIs data services FAIR." A proposal to the EU’s Horizon 2020 call INFRAEOSC-04
    EOSCEuropean Open Science Cloud (EOSC)
    ERDDAPNOAA Environmental Research Division Data Access Protocol (ERDDAP)
    ESAEuropean Space Agency (ESA)
    ESGFEarth System Grid Federation (ESGF)
    EWCEuropean Weather Cloud ()
    FAIRFindability, Accessibility, Interoperability and Reusability RD3
    FEIDEIdentity Federation of the Norwegian National Research and Education Network (UNINETT) (FEIDE)
    FFINorwegian Defence Research Establishment (FFI)
    FORCE11Future of Research Communication and e-Scholarship (FORCE11)
    GCMDGlobal Change Master Directory (GCMD)
    GCWGlobal Cryosphere Watch (GCW)
    GeoAccessNOAn NFR-funded infrastructure project, 2015- (GeoAccessNO)
    GISGeographic Information System
    GRIBGRIdded Binary or General Regularly-distributed Information in Binary form. WMO standard file format for gridded data (GRIB)
    HDF, HDF5Hierarchical Data Format (HDF)
    HyraxOPeNDAP 4 Data Server (Hyrax)
    IMRInstitute of Marine Research (IMR)
    INSPIREInfrastructure for Spatial Information in Europe (INSPIRE)
    ISO 19115ISO standard for geospatial metadata (ISO 19115-1:2014).
    IPYInternational Polar Year (IPY)
    JRCCJoint Rescue Coordination centre (Hovedredningssentralen)
    KDVHKlimaDataVareHus
    KPIKey Performance Indicator (KPI)
    METCIMMET Norway Crisis and Incident Management (METCIM)
    METSISMET Norway Scientific Information System
    MMDMet.no Metadata Format MMD
    MOAIMeta Open Archives Initiative server (MOAI)
    ncWMSWMS implementation for NetCDF files (ncWMS)
    NERSCNansen Environmental and Remote Sensing Center (NERSC)
    NetCDFNetwork Common Data Format (NetCDF)
    NetCDF/CFA common combination of NetCDF file format with CF-compliant attributes.
    NFRThe Research Council of Norway (NFR)
    NILUNorwegian Institute for Air Research (NILU)
    NIVANorwegian Institute for Water Research (NIVA)
    NMDCNorwegian Marine Data Centre, NFR-supported infrastructure project 2013-2017 (NMDC)
    NorDataNetNorwegian Scientific Data Network, an NFR-funded project 2015-2020 (NorDataNet)
    Norway DigitalNorwegian national spatial data infrastructure organisation (Norway Digital). Norwegian: Norge digitalt
    NORMAPNorwegian Satellite Earth Observation Database for Marine and Polar Research, an NFR-funded project 2010-2016 (NORMAP)
    NRPANorwegian Radiation Protection Authority (NRPA)
    NSDINational Spatial Data Infrastructure, USA (NSDI)
    NVENorwegian Water Resources and Energy Directorate (NVE)
    NWPNumerical Weather Prediction
    OAI-PMHOpen Archives Initiative - Protocol for Metadata Harvesting (OAI-PMH)
    OAISOpen Archival Information System (OAIS)
    OCEANOTRONWeb server dedicated to the dissemination of ocean in situ observation data collections (OCEANOTRON)
    OECDThe organisation for Economic Co-operation and Developement. OECD
    OGCOpen Geospatial Consortium (OGC)
    OGC O&M OGCObservations and Measurements standard (OGC O&M)
    OLAOperational-level Agreement (OLA)
    OPeNDAPOpen-source Project for a Network Data Access Protocol (OPeNDAP) - reference server implementation
    PIDPersistent Identifier (PID)
    RM-ODPReference Model of Open Distributed Processing (RM-ODP)
    PROVA W3C Working Group on provenance and a Family of Documents (PROV)
    SAONSustaining Arctic Observing Networks (SAON/IASC)
    SDISpatial Data Infrastructure
    SDNSeaDataNet, Pan-European infrastructure for ocean & marine data management
    SIOSSvalbard Integrated Arctic Earth Observing System
    SIOS-KCSIOS Knowledge Centre, an NFR-supported project 2015-2018 (SIOS-KC)
    SKOSSimple Knowledge Organization System (SKOS)
    SLAService-level Agreement (SLA)
    SolRApache Enterprise search server with a REST-like API (SolR)
    StInfoSysMET Norway’s Station Information System
    TDSTHREDDS Data Server (TDS)
    THREDDSThematic Real-time Environmental Distributed Data Services
    UNSDIUnited Nations Spatial Data Infrastructure (UNSDI)
    UUIDUniversally Unique Identifier (UUID)
    W3CWorld Wide Web Consortium (W3C)
    WCSOGC Web Coverage Service (WCS)
    WFSOGC Web Feature Service (WFS)
    WIGOSWMO Integrated Global Observing System (WIGOS)
    WISWMO Information System (WIS)
    WMOWorld Meteorological Organisation (WMO)
    WMSOGC Web Map Service (WMS)
    WPSOGC Web Processing Service (WPS)
    YOPPYear of Polar Prediction (YOPP Data Portal)

    Glossary of Terms and Names

    TermDescription
    Controlled vocabularyA carefully selected list of terms (words and phrases) controlled by some authority. They are used to tag information elements (such as datasets) to improve searchability.
    Data Governancesee chapter 3. Data Governance
    Data life cycle management“Data life cycle management (DLM) is a policy-based approach to managing the flow of an information system’s data throughout its life cycle: from creation and initial storage to the time when it becomes obsolete and is deleted.” Excerpt from TechTarget article. Alias: life cycle management
    Data Management Plan (DMP)“A written document that describes the data you expect to acquire or generate during the course of a research project, how you will manage, describe, analyse, and store those data, and what mechanisms you will use at the end of your project to share and preserve your data.” Stanford Libraries
    Data centreA combination of a (distributed) data repository and the data availability services and information about them (e.g., a metadata catalog). A data centre may include contributions from several other data centres.
    Data managementHow data sets are handled by the organisation through the entire value chain - include receiving, storing, metadata management and data retrieval.
    Data provenance“The term ‘data provenance’ refers to a record trail that accounts for the origin of a piece of data (in a database, document or repository) together with an explanation of how and why it got to the present place.” (Gupta, 2009).
    Data repositoryA set of distributed components that will hold the data and ensure they can be queried and accessed according to agreed protocols. This component is also known as a Data Node.
    DatasetA dataset is a pre-defined grouping or collection of related data for an intended use. Datasets may be categorised by:
    • Source, such as observations (in situ, remotely sensed) and numerical model projections and analyses;
    • Processing level, such as “raw data” (values measured by an instrument), calibrated data, quality-controlled data, derived parameters (preferably with error estimates), temporally and/or spatially aggregated variables;
    • Data type, including point data, sections and profiles, lines and polylines, polygons, gridded data, volume data, and time series (of points, grids, etc.).
    Data having all of the same characteristics in each category, but different independent variable ranges and/or responding to a specific need, are normally considered part of a single dataset. In the context of data preservation a dataset consists of the data records and their associated knowledge (information, tools). In practice, our datasets should conform to the Unidata CDM dataset definition, as much as possible.
    Dataset - CDMCommon Data Model. A dataset that “may be a NetCDF, HDF5, GRIB, etc. file, an OPeNDAP dataset, a collection of files, or anything else which can be accessed through the NetCDF API.”
    Dynamic geodataData describing geophysical processes which are continuously evolving over time. Typically these data are used for monitoring and prediction of the weather, sea, climate and environment. Dynamic geodata is weather, environment and climate-related data that changes in space and time and is thus descriptive of processes in nature. Examples are weather observations, weather forecasts, pollution (environmental toxins) in water, air and sea, information on the drift of cod eggs and salmon lice, water flow in rivers, driving conditions on the roads and the distribution of sea ice. Dynamic geodata provides important constraints for many decision-making processes and activities in society.
    FAIR principlesThe four foundational principles of good data management and stewardship: Findability, Accessibility, Interoperability and Reusability. See chapter 2. FAIR
    GeodatalovenNorwegian regulation toward good and efficient access to public geographic information for public and private purposes."
    GeonorgeThe national website for map data and other location information in Norway.
    Geographic Information System (GIS)System designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data. (Clarke, K. C., 1986) GIS systems have lately evolved in distributed Spatial Data Infrastructures (SDI)
    InteroperabilityThe ability of data or tools from non-cooperating resources to integrate or work together with minimal effort.
    Linked dataA method of publishing structured data so that they can be interlinked and become more useful through semantic queries, i.e., through machine-machine interactions. (see Wikipedia article)
    MetadataSee chapter 4.2. Metadata
    Network Common Data Form (NetCDF)"Set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. It is also a community standard for sharing scientific data."
    Semantic web“The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries". W3C (see Wikipedia article)
    Spatial Data Infrastructure"Spatial Data Infrastructure (SDI) is defined as a framework of policies, institutional arrangements. technologies, data, and people that enables the sharing and effective usage of geographic information by standardising formats and protocols for access and interoperability." (Tonchovska et al, 2012). SDI has evolved from GIS. Among the largest implementations are: NSDI in the USA, INSPIRE in Europe and UNSDI as an effort by the United Nations. For areas in the Arctic, there is arctic-sdi.org.
    Unified data managementA common approach to data management in a grouping of separate data management enterprises.
    Web serviceWeb services are used to communicate metadata, data and to offer processing services. Much effort has been put on standardisation of web services to ensure they are reusable in different contexts. In contrast to web applications, web services communicate with other programs, instead of interactively with users. (See TechTerms article)
    Workflow managementWorkflow management is the process of tracking data, software and other actions on data into a new form of the data. It is related to data provenance, but is usually used in the context of workflow management systems.