Finding useful data across multiple biomedical data repositories using DataMed

The value of broadening searches for data across multiple repositories has been identified by the biomedical research community. As part of the US National Institutes of Health (NIH) Big Data to Knowledge initiative, we work with an international community of researchers, service providers and knowledge experts to develop and test a data index and search engine, which are based on metadata extracted from various data sets in a range of repositories. DataMed is designed to be, for data, what PubMed has been for the scientific literature. DataMed supports the findability and accessibility of data sets. These characteristics—along with interoperability and reusability—compose the four FAIR principles to facilitate knowledge discovery in today's big data–intensive science landscape.

[1]  Imma Subirats Coll,et al.  Open archives initiative. Protocol for metadata harvesting (OAI-PMH): descripción, funciones y aplicaciones de un protocolo , 2003 .

[2]  Atanas Kiryakov,et al.  Semantic Annotation, Indexing, and Retrieval , 2003, SEMWEB.

[3]  Haruki Nakamura,et al.  PDBML: the representation of archival macromolecular structure data in XML , 2005, Bioinform..

[4]  Alireza Noruzi Google Scholar: The New Generation of Citation Indexes , 2005 .

[5]  Sam Ruby,et al.  RESTful Web Services , 2007 .

[6]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[7]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[8]  D. Kejariwal,et al.  Is Your Journal Indexed in PubMed? Relevance of PubMed in Biomedical Scientific Literature Today , 2012 .

[9]  Africa Hands,et al.  Microsoft Academic Search – http://academic.research.microsoft.com , 2012 .

[10]  Sun Huh Journal Article Tag Suite 1.0: National Information Standards Organization standard of journal extensible markup language , 2014 .

[11]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[12]  Vincent Larivière,et al.  Tweeting biomedicine: An analysis of tweets and citations in the biomedical literature , 2013, J. Assoc. Inf. Sci. Technol..

[13]  Philip E. Bourne,et al.  The NIH Big Data to Knowledge (BD2K) initiative , 2015, J. Am. Medical Informatics Assoc..

[14]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[15]  Robert Petryszak,et al.  Discovering and linking public omics data sets using the Omics Discovery Index , 2017, Nature Biotechnology.

[16]  Lucila Ohno-Machado,et al.  DATS, the data tag suite to enable discoverability of datasets , 2017, Scientific Data.