Dataset Search: A lightweight, community-built tool to support research data discovery

Objective: Promoting discovery of research data helps archived data realize its potential to advance knowledge. Montana State University (MSU) Dataset Search aims to support discovery and reporting for research datasets created by researchers at institutions. Methods and Results: The Dataset Search application consists of five core features: a streamlined browse and search interface, a data model based on dataset discovery, a harvesting process for finding and vetting datasets stored in external repositories, an administrative interface for managing the creation, ingest, and maintenance of dataset records, and a dataset visualization interface to demonstrate how data is produced and used by MSU researchers. Conclusion: The Dataset Search application is designed to be easily customized and implemented by other institutions. Indexes like Dataset Search can improve search and discovery for content archived in data repositories, therefore amplifying the impact and benefits of archived data. Correspondence: Sara Mannheimer: sara.mannheimer@montana.edu Received: June 4, 2020 Accepted: September 8, 2020 Published: January 19, 2021 Copyright: © 2021 Mannheimer et al. This is an open access article licensed under the terms of the Creative Commons Attribution License. Data Availability: Code associated with this paper is available in Zenodo, via Github at: https://doi.org/10.5281/zenodo.4046567. MSU Dataset Search is available at: https://arc.lib.montana.edu/msu -dataset-search. Disclosures: The authors report no conflict of interest. The substance of this article is based upon a lightning talk at RDAP Summit 2020. Additional information at end of article. Full-Length Paper Dataset Search: A lightweight, community-built tool to support research data discovery Sara Mannheimer, Jason A. Clark, Kyle Hagerman, Jakob Schultz, and James Espeland Montana State University, Bozeman, MT, USA

[1]  Christine L. Borgman,et al.  Uses and Reuses of Scientific Data: The Data Creators’ Advantage , 2019, 1.2.

[2]  Credit where credit is overdue , 2009, Nature Biotechnology.

[3]  Sara Mannheimer,et al.  Building a Dataset Search for Institutions: Project Update , 2019, Publ..

[4]  Martina Stockhause,et al.  The TRUST Principles for digital repositories , 2020, Scientific Data.

[5]  Bradley M. Hemminger,et al.  Scientific data repositories on the Web: An initial survey , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Christopher W. Belter,et al.  Data sharing in PLOS ONE: An analysis of Data Availability Statements , 2018, PloS one.

[7]  Kenning Arlitsch,et al.  Invisible Institutional Repositories: Addressing the Low Indexing Ratios of IRs in Google Scholar , 2012, Libr. Hi Tech.

[8]  Heather A. Piwowar,et al.  Data reuse and the open data citation advantage , 2013, PeerJ.

[9]  Fiona Godlee,et al.  Data Sharing Statements for Clinical Trials: A Requirement of the International Committee of Medical Journal Editors , 2017, Journal of Korean medical science.

[10]  Lucila Ohno-Machado,et al.  DataMed – an open source discovery index for finding biomedical datasets , 2018, J. Am. Medical Informatics Assoc..

[11]  Edward N. Baker,et al.  Data archiving and availability in an era of open science , 2017, IUCrJ.

[12]  C. Borgman,et al.  If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology , 2013, PloS one.

[13]  Doralyn Rossmann,et al.  The Open SESMO (Search Engine & Social Media Optimization) Project: Linked and Structured Data for Library Subscription Databases to Enable Web-scale Discovery in Search Engines , 2017 .

[14]  Carole Goble,et al.  RO-Crate, a lightweight approach to Research Object data packaging , 2019, RO.

[15]  Elizabeth Yakel,et al.  The role of data reuse in the apprenticeship process , 2013, ASIST.

[16]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[17]  Lucila Ohno-Machado,et al.  DATS, the data tag suite to enable discoverability of datasets , 2017, Scientific Data.

[18]  Maureen Haaker,et al.  Developing Research-Led Teaching: Two Cases of Practical Data Reuse in the Classroom , 2017 .

[19]  Lily Troia,et al.  A Data Citation Roadmap for Scholarly Data Repositories , 2017 .

[20]  Ian Lamb,et al.  Shining a Light on Scientific Data: Building a Data Catalog to Foster Data Sharing and Reuse , 2016 .

[21]  Kei Koizumi,et al.  Increasing Access to the Results of Federally Funded Scientific Research , 2016 .

[22]  Alex H. Poole,et al.  How has your science data grown? Digital curation and the human factor: a critical literature review , 2015 .

[23]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[24]  Jeffrey R. Spies,et al.  SHARE: Community-focused Infrastructure and a Public Goods, Scholarly Database to Advance Access to Research , 2017, D Lib Mag..

[25]  Leila Belle Sterman,et al.  Citations as Data: Harvesting the Scholarly Record of Your University to Enrich Institutional Knowledge and Support Research , 2017, Coll. Res. Libr..

[26]  Sara Mannheimer,et al.  Discovery and Reuse of Open Datasets: An Exploratory Study , 2016 .

[27]  Youngseek Kim,et al.  Institutional and individual factors affecting scientists' data‐sharing behaviors: A multilevel analysis , 2016, J. Assoc. Inf. Sci. Technol..

[28]  Ping Zhang,et al.  Understanding data sharing behaviors of STEM researchers: The roles of attitudes, norms, and data repositories , 2015 .

[29]  Elizabeth D. Dalton,et al.  Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide , 2015, PloS one.

[30]  Peter Cotroneo,et al.  Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge , 2017, Database J. Biol. Databases Curation.

[31]  Peter Schirmbacher,et al.  Making Research Data Repositories Visible: The re3data.org Registry , 2013, PloS one.

[32]  Tessa E. Pronk,et al.  The Time Efficiency Gain in Sharing and Reuse of Research Data , 2019, Data Sci. J..

[33]  Mary Vardigan,et al.  Core trustworthy data repositories requirements , 2016 .

[34]  M. Rosenau Data availability , 2018 .

[35]  Michael Witt Institutional Repositories and Research Data Curation in a Distributed Environment , 2008, Libr. Trends.

[36]  Yi Shen,et al.  Burgeoning Data Repository Systems, Characteristics, and Development Strategies: Insights of Natural Resources and Environmental Scientists , 2017, Data Inf. Manag..