Glossary of Terms and Names

TermDescription
Controlled vocabularyA carefully selected list of terms (words and phrases) controlled by some authority. They are used to tag information elements (such as datasets) to improve searchability.
Data Governancesee chapter 3. Data Governance
Data life cycle management“Data life cycle management (DLM) is a policy-based approach to managing the flow of an information system’s data throughout its life cycle: from creation and initial storage to the time when it becomes obsolete and is deleted.” Excerpt from TechTarget article. Alias: life cycle management
Data Management Plan (DMP)“A written document that describes the data you expect to acquire or generate during the course of a research project, how you will manage, describe, analyse, and store those data, and what mechanisms you will use at the end of your project to share and preserve your data.” Stanford Libraries
Data centreA combination of a (distributed) data repository and the data availability services and information about them (e.g., a metadata catalog). A data centre may include contributions from several other data centres.
Data managementHow data sets are handled by the organisation through the entire value chain - include receiving, storing, metadata management and data retrieval.
Data provenance“The term ‘data provenance’ refers to a record trail that accounts for the origin of a piece of data (in a database, document or repository) together with an explanation of how and why it got to the present place.” (Gupta, 2009).
Data repositoryA set of distributed components that will hold the data and ensure they can be queried and accessed according to agreed protocols. This component is also known as a Data Node.
DatasetA dataset is a pre-defined grouping or collection of related data for an intended use. Datasets may be categorised by:
  • Source, such as observations (in situ, remotely sensed) and numerical model projections and analyses;
  • Processing level, such as “raw data” (values measured by an instrument), calibrated data, quality-controlled data, derived parameters (preferably with error estimates), temporally and/or spatially aggregated variables;
  • Data type, including point data, sections and profiles, lines and polylines, polygons, gridded data, volume data, and time series (of points, grids, etc.).
Data having all of the same characteristics in each category, but different independent variable ranges and/or responding to a specific need, are normally considered part of a single dataset. In the context of data preservation a dataset consists of the data records and their associated knowledge (information, tools). In practice, our datasets should conform to the Unidata CDM dataset definition, as much as possible.
Dataset - CDMCommon Data Model. A dataset that “may be a NetCDF, HDF5, GRIB, etc. file, an OPeNDAP dataset, a collection of files, or anything else which can be accessed through the NetCDF API.”
Dynamic geodataData describing geophysical processes which are continuously evolving over time. Typically these data are used for monitoring and prediction of the weather, sea, climate and environment. Dynamic geodata is weather, environment and climate-related data that changes in space and time and is thus descriptive of processes in nature. Examples are weather observations, weather forecasts, pollution (environmental toxins) in water, air and sea, information on the drift of cod eggs and salmon lice, water flow in rivers, driving conditions on the roads and the distribution of sea ice. Dynamic geodata provides important constraints for many decision-making processes and activities in society.
FAIR principlesThe four foundational principles of good data management and stewardship: Findability, Accessibility, Interoperability and Reusability. See chapter 2. FAIR
GeodatalovenNorwegian regulation toward good and efficient access to public geographic information for public and private purposes."
GeonorgeThe national website for map data and other location information in Norway.
Geographic Information System (GIS)System designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data. (Clarke, K. C., 1986) GIS systems have lately evolved in distributed Spatial Data Infrastructures (SDI)
InteroperabilityThe ability of data or tools from non-cooperating resources to integrate or work together with minimal effort.
Linked dataA method of publishing structured data so that they can be interlinked and become more useful through semantic queries, i.e., through machine-machine interactions. (see Wikipedia article)
MetadataSee chapter 4.2. Metadata
Network Common Data Form (NetCDF)"Set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. It is also a community standard for sharing scientific data."
Semantic web“The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries". W3C (see Wikipedia article)
Spatial Data Infrastructure"Spatial Data Infrastructure (SDI) is defined as a framework of policies, institutional arrangements. technologies, data, and people that enables the sharing and effective usage of geographic information by standardising formats and protocols for access and interoperability." (Tonchovska et al, 2012). SDI has evolved from GIS. Among the largest implementations are: NSDI in the USA, INSPIRE in Europe and UNSDI as an effort by the United Nations. For areas in the Arctic, there is arctic-sdi.org.
Unified data managementA common approach to data management in a grouping of separate data management enterprises.
Web serviceWeb services are used to communicate metadata, data and to offer processing services. Much effort has been put on standardisation of web services to ensure they are reusable in different contexts. In contrast to web applications, web services communicate with other programs, instead of interactively with users. (See TechTerms article)
Workflow managementWorkflow management is the process of tracking data, software and other actions on data into a new form of the data. It is related to data provenance, but is usually used in the context of workflow management systems.