Federated Arctic data discovery via the open DataONE network using schema.org
We present an open, well-tested system for federated Arctic data discovery that leverages schema.org metadata for datasets. Through the POLDER effort, international Arctic data repositories have focused on adapting and adopting the Science on schema.org guidelines for publishing datasets. This lightweight vocabulary for representing dataset-level metadata is being adopted across data repositories worldwide, driven largely by Google’s promotion. Many repositories also hold more detailed metadata records using a diverse set of well-established metadata languages, including the ISO 19115 family of specifications, Ecological Metadata Language, and many others. Schema.org metadata provides a common, lightweight mechanism for spanning these communities of practice alongside the more detailed original metadata. DataONE’s metadata harvesting system handles and harmonizes all of these metadata formats. The DataONE harvest engine is a scalable, flexible open source system that handles multiple protocols and specifications. A simple route for repositories to be discoverable through DataONE is to provide a sitemap listing schema.org entries. DataONE also supports other harvest protocols and the DataONE API. Regardless of the transfer protocol, once metadata is harvested, DataONE’s indexing system validates the content against published schemas, harmonizes the vocabulary into a consolidated search index, and provides a suite of services such as assessing FAIR metadata quality and reporting on data access and citation. Indexed metadata is available through the DataONE search service for programmatic access, through augmented schema.org entries, and through the DataONE web site. The search interface provides thematic, spatial, temporal, and project-based searches across the Arctic. We leverage DataONE's system for building custom data portals to provide a federated Arctic Data search portal. We present an overview of the DataONE harvester for schema.org along with challenges encountered in scaling harmonized metadata indexing across a diverse group of dozens of Arctic repositories.