Southern Ocean Decade & Polar Data Forum Week 2021

Online, 20 - 24 September 2021 An Ocean Of Opportunities

Building a custom ontology to enable advanced search capabilities for MOSAiC expedition datasets archived in the Arctic Data Center

Jasmine J.M. Lai, Mark P. Schildhauer, Samantha R. Csik and Bryce D. Mecum

In October 2020, the Multidisciplinary drifting Observatory for the Study of Arctic Climate, “MOSAiC” expedition, completed a year-long data collection effort, contributing to our understanding of climate processes in the Central Arctic, and more generally, global climate change. The Arctic Data Center (ADC) has since worked with MOSAiC researchers to archive, in a custom data portal, all of the USA’s National Science Foundation-funded data products and corresponding metadata records collected during the expedition (https://arcticdata.io/data-portals/). To facilitate greater data findability and interpretability, the ADC team built a MOSAiC ontology (i.e. a controlled vocabulary expressed in RDF/OWL) to formally define and inter-relate terms, that were then used to semantically annotate the “native” terminology used by researchers in describing MOSAiC metadata. The ontology reduces search ambiguity and increases the precision of search efforts, enabling users to better understand the meaning of terms, and refine their searches. The MOSAiC ontology serves as a case study for building custom, project-specific ontologies to improve data transparency and findability. In addition, due to its standard format and open schema, the MOSAiC ontology can be easily shared, revised, and extended to accommodate new types of information, or new interpretations of the data contents; and used for annotation without requiring changes to the underlying data or data structures. As a standardized vocabulary, the MOSAiC ontology could be used to annotate all MOSAiC datasets, allowing for enhanced interoperability across the three international repositories (PANGAEA, ARM and ADC) holding MOSAiC data. Alignment of MOSAiC terms with broader community ontologies, such as the Environment Ontology, EnvO, or the NERC vocabulary, would further increase the findability and reusability of these invaluable MOSAiC data resources.