A tool for assessing alignment of biomedical data repositories with open, FAIR, citation and trustworthy principles

Increasing attention is being paid to the operation of biomedical data repositories in light of efforts to improve how scientific data is handled and made available for the long term. Multiple groups have produced recommendations for functions that biomedical repositories should support, with many using requirements of the FAIR data principles as guidelines. However, FAIR is but one set of principles that has arisen out of the open science community. They are joined by principles governing open science, data citation and trustworthiness, all of which are important aspects for biomedical data repositories to support. Together, these define a framework for data repositories that we call OFCT: Open, FAIR, Citable and Trustworthy. Here we developed an instrument using the open source PolicyModels toolkit that attempts to operationalize key aspects of OFCT principles and piloted the instrument by evaluating eight biomedical community repositories listed by the NIDDK Information Network (dkNET.org). Repositories included both specialist repositories that focused on a particular data type or domain, in this case diabetes and metabolomics, and generalist repositories that accept all data types and domains. The goal of this work was both to obtain a sense of how much the design of current biomedical data repositories align with these principles and to augment the dkNET listing with additional information that may be important to investigators trying to choose a repository, e.g., does the repository fully support data citation? The evaluation was performed from March to November 2020 through inspection of documentation and interaction with the sites by the authors. Overall, although there was little explicit acknowledgement of any of the OFCT principles in our sample, the majority of repositories provided at least some support for their tenets.

[1]  Scott C. Edmunds,et al.  FAIRsharing Collaboration with DataCite and Publishers: Data Repository Selection, Criteria That Matter , 2019 .

[2]  Martina Stockhause,et al.  The TRUST Principles for digital repositories , 2020, Scientific Data.

[3]  Ruth E. Duerr,et al.  Achieving human and machine accessibility of cited data in scholarly publications , 2015, PeerJ Comput. Sci..

[4]  Fiona Murphy,et al.  A Data Citation Roadmap for Scientific Publishers , 2017 .

[5]  Charles E. Cook,et al.  Identifying ELIXIR Core Data Resources , 2016, F1000Research.

[6]  Latanya Sweeney,et al.  DataTags, Data Handling Policy Spaces and the Tags Language , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[7]  Daniel J Cooper,et al.  FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. , 2019, Cell systems.

[8]  Erik Schultes,et al.  Evaluating FAIR maturity through a scalable, automated, community-governed framework , 2019, Scientific Data.

[9]  Henning Hermjakob,et al.  A data citation roadmap for scholarly data repositories , 2016, Scientific Data.

[10]  Bianca Kramer,et al.  The Scholarly Commons - principles and practices to guide research communication , 2017 .

[11]  Haruki Nakamura,et al.  The Protein Data Bank at 40: reflecting on the past to prepare for the future. , 2012, Structure.

[12]  Ying Qin,et al.  The NIDDK Central Repository at 8 years—Ambition, Revision, Use and Impact , 2011, Database J. Biol. Databases Curation.

[13]  Lin Jennifer,et al.  Principles for Open Scholarly Infrastructures-v1 , 2015 .

[14]  Maryann E Martone,et al.  A survey of the neuroscience resource landscape: perspectives from the neuroscience information framework. , 2012, International review of neurobiology.

[15]  Sarah Callaghan,et al.  Joint declaration of data citation principles , 2014 .

[16]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[17]  Jeffrey S. Grethe,et al.  The NIDDK Information Network: A Community Portal for Finding Data, Materials, and Tools for Researchers Studying Diabetes, Digestive, and Kidney Diseases , 2015, PloS one.

[18]  John Kunze,et al.  Uniform resolution of compact identifiers for biomedical data , 2017, Scientific Data.

[19]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[20]  Erik Schultes,et al.  Evaluating FAIR maturity through a scalable, automated, community-governed framework , 2019, Scientific Data.