Improving the discoverability, accessibility, and citability of omics datasets: a case report

Although omics datasets represent valuable assets for hypothesis generation, model testing, and data validation, the infrastructure supporting their reuse lacks organization and consistency. Using nuclear receptor signaling transcriptomic datasets as proof of principle, we developed a model to improve the discoverability, accessibility, and citability of published omics datasets. Primary datasets were retrieved from archives, processed to extract data points, then subjected to metadata enrichment and gap filling. The resulting secondary datasets were exposed on responsive web pages to support mining of gene lists, discovery of related datasets, and single-click citation integration with popular reference managers. Automated processes were established to embed digital object identifier-driven links to the secondary datasets in associated journal articles, small molecule and gene-centric databases, and a dataset search engine. Our model creates multiple points of access to reprocessed and reannotated derivative datasets across the digital biomedical research ecosystem, promoting their visibility and usability across disparate research communities.

[1]  Russ B Altman,et al.  PharmGKB: the Pharmacogenomics Knowledge Base. , 2013, Methods in molecular biology.

[2]  Neil J McKenna,et al.  Transcriptomine, a web resource for nuclear receptor signaling transcriptomes. , 2012, Physiological genomics.

[3]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[4]  Tsippi Iny Stein,et al.  In-silico human genomics with GeneCards , 2011, Human Genomics.

[5]  Christine L Borgman,et al.  Why are the attribution and citation of scientific data important? In: Uhlir, Paul and Cohen, Daniel (eds.). Report from Developing Data Attribution and Citation Practices and Standards: An International Symposium and Workshop. , 2012 .

[6]  Jon R Lorsch,et al.  Perspective: Sustaining the big-data ecosystem , 2015, Nature.

[7]  Michelle Dunn,et al.  The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data , 2014, J. Am. Medical Informatics Assoc..

[8]  Christian J Stoeckert,et al.  Much room for improvement in deposition rates of expression microarray datasets , 2008, Nature Methods.

[9]  Yvonne M. Socha,et al.  Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data , 2013, Data Sci. J..

[10]  Yolanda F. Darlington,et al.  Nuclear Receptor Signaling Atlas: Opening Access to the Biology of Nuclear Receptor Signaling Pathways , 2015, PloS one.

[11]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[12]  C. Martin 2015 , 2015, Les 25 ans de l’OMC: Une rétrospective en photos.

[13]  R. Lanz,et al.  Nuclear receptor coregulators: cellular and molecular biology. , 1999, Endocrine reviews.

[14]  Jeffrey S. Grethe,et al.  The NIDDK Information Network: A Community Portal for Finding Data, Materials, and Tools for Researchers Studying Diabetes, Digestive, and Kidney Diseases , 2015, PloS one.

[15]  Ruth E. Duerr,et al.  Achieving human and machine accessibility of cited data in scholarly publications , 2015, PeerJ Comput. Sci..

[16]  Kenneth W Witwer,et al.  Data submission and quality in microarray-based microRNA profiling. , 2013, Clinical chemistry.

[17]  K. Umesono,et al.  The nuclear receptor superfamily: The second decade , 1995, Cell.

[18]  Sarah Callaghan,et al.  Joint declaration of data citation principles , 2014 .

[19]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..