Norwegian version of this page

Metadata and ontologies

Metadata for research data is information that makes it possible to identify and understand who created the data, where it comes from, and what it pertains to.

Metadata should be readable by both machines and humans. Standardized metadata is crucial for making data findable, accessible, interoperable, and reusable (FAIR).

Metadata can be relevant at multiple levels. At the dataset level, it is always relevant, while at the file level, the need for metadata may vary depending on the field of study.

Standardised metadata

There are general and domain-specific metadata standards for research data, but for most, there won't be a standard that fits exactly the data you need to describe. Metadata should preferably be placed in standardized fields, which allows for the exchange of information, for instance, between search solutions.

Different archives for research data use various and somewhat adapted standards. Therefore, the choice of archive will often dictate which metadata standard you should apply.

  • Sikt Arkiv (formerly NSD) uses the metadata standard "Data Documentation Initiative" (DDI), which is tailored to the social sciences and survey data. DDI is developed and maintained within a network of similar archives.
  • DataverseNO utilizes Dataverse's block-based metadata customization, which prioritizes data exchange through export (JSON and XML) and mapping to the most commonly used standards (DDI, DublinCore, DataCite, etc.), while also incorporating certain domain-specific fields. The standard is developed for research data and is generic, but it continues to work on further adaptations for different research fields through the development of metadata blocks. This standard is used in many Dataverse archives globally and also forms the basis for the IT department's metadata tool.

Should you create your own metadata standard?

It is extremely resource-intensive for a research environment to take on the development, implementation, and maintenance of a new standard. The advantage of using an existing standard is so significant that in the vast majority of cases, it makes sense to use a generic standard for research data metadata rather than considering a custom adaptation.

Ontologies or vocabularies

By using controlled vocabularies in data descriptions, so-called ontologies, you can refer to the correct understanding of a term. This can be relevant at the dataset level, but it is primarily interesting at the data point level within files to precisely identify various factors.

Using ontologies at the variable level enables the identification of commonalities across larger datasets.

Ontologies and URIs (Uniform Resource Identifiers) are used to create linked open data. This has significant advantages if you have large amounts of standardized data and/or wish to combine data from different sources.

Examples of fields where ontologies are used in this manner include:

Subject keywords in metadata

At the dataset level, keywords can be made interoperable by using controlled subject keywords in the description of the dataset. There are many different vocabularies available:

  • OLS (Ontology Lookup Service)
  • Mesh in the field of medicine and health sciences
  • Agrovoc in agriculture and plant-related research
  • Humord - A Norwegian thesaurus for the humanities and social sciences with related disciplines, managed by the University Library.

 

Research data: topic overview

Research data main page

Published Sep. 24, 2024 9:35 AM - Last modified Sep. 24, 2024 10:21 AM