Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications

Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.

Lennart Martens | Stefano Toppo | Geoffrey C. Fox | Doron Lancet | Dmitrij Frishman | Peter Uetz | Rui Chen | Larry Smarr | Joseph W. Kemnitz | Allison P. Heath | Harsha Rajasimha | Isaac S. Kohane | Alexey I. Nesvizhskii | Sanjeeva Srivastava | Eugene Kolker | Brynn H. Voy | Todd Smith | Robert Grossman | Gilbert S. Omenn | Roger Higdon | Larissa Stanberry | Elizabeth Stewart | Gregory Yandl | Natali Kolker | Nathaniel Anderson | Allison P. Heath | Kenneth Verheggen | Louise Warnich | Michael Snyder | George Mias | Vural Özdemir | Alexander Kel | Andrey Lisitsa | Matthew E. Monroe | Paola Masuzzo | Santosh Noronha | Sukru Aynacioglu | Wu-Chun Feng | Jean-Claude Marshall | Sean Mooney | Jerry Sheehan | Srikanth Rapole | William Hancock | Gordon A. Anderson | Ancha V. Baranova | Shawn R. Campagna | John Choiniere | Stephen P. Dearth | Lynnette Ferguson | Mara H. Hutz | Imre Janko | Lihua Jiang | Sanjay Joshi | Elaine Lee | Weizhong Li | Adrian Llerena | Courtney MacNealy-Koch | Amanda May | Elizabeth Montague | Preveen Ramamoorthy | Charles V. Smith | Steven W. Wilhelm | D. Frishman | R. Grossman | P. Uetz | G. Fox | A. Nesvizhskii | I. Kohane | S. Toppo | S. Mooney | G. Omenn | E. Kolker | D. Lancet | M. Hutz | L. Stanberry | R. Chen | A. Baranova | A. Lisitsa | L. Smarr | R. Higdon | Elizabeth Montague | Elizabeth Stewart | Imre Janko | John Choiniere | N. Kolker | M. Snyder | M. Monroe | H. Rajasimha | A. Kel | G. Mias | Lihua Jiang | Wu-chun Feng | Sanjeeva Srivastava | L. Ferguson | G. Anderson | W. Hancock | L. Martens | J. Marshall | S. Wilhelm | Gregory Yandl | A. Llerena | S. Noronha | Weizhong Li | K. Verheggen | S. Campagna | S. Rapole | Paola Masuzzo | V. Özdemir | B. Voy | L. Warnich | Charles V. Smith | J. Kemnitz | Amanda May | S. Aynacioglu | P. Ramamoorthy | Nathaniel Anderson | Sanjay Joshi | Elaine Lee | Courtney MacNealy-Koch | J. Sheehan | Todd Smith

[1]  George I. Mias,et al.  Personal genomes, quantitative dynamic omics and personalized medicine , 2013, Quantitative Biology.

[2]  P. Mariani,et al.  Microbiome profiling in fresh-cut products , 2015 .

[3]  Eugene Kolker,et al.  Opportunities and challenges for the life sciences community. , 2012, Omics : a journal of integrative biology.

[4]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[5]  Melissa Haendel,et al.  A sea of standards for omics data: sink or swim? , 2013, J. Am. Medical Informatics Assoc..

[6]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[7]  Qingchong Qiu,et al.  Metadata checklist: identification of CHI3L1 and MASP2 as a biomarker pair for liver cancer through integrative secretome and transcriptome analysis. , 2014, Omics : a journal of integrative biology.

[8]  Johannes Griss,et al.  The Proteomics Identifications (PRIDE) database and associated tools: status in 2013 , 2012, Nucleic Acids Res..

[9]  Doron Lancet,et al.  MOPED: Model Organism Protein Expression Database , 2011, Nucleic Acids Res..

[10]  Eugene Kolker,et al.  DELSA Global for “Big Data” and the Bioeconomy: Catalyzing Collective Innovation , 2012 .

[11]  Eugene Kolker,et al.  Reproducibility: In praise of open research measures , 2013, Nature.

[12]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[13]  Lennart Martens,et al.  Delsa Workshop IV: Launching the Quantified Human Initiative , 2013, Big Data.

[14]  E. Kolker,et al.  Can "normal" protein expression ranges be estimated with high-throughput proteomics? , 2015, Journal of proteome research.

[15]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[16]  C. Masimirembwa,et al.  Cytochrome P450 pharmacogenetics in African populations: implications for public health , 2014, Expert opinion on drug metabolism & toxicology.

[17]  Edd Dumbill,et al.  Introducing a Metadata Checklist for Omics Data , 2013, Big Data.

[18]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[19]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[20]  Eugene Kolker,et al.  OMICS studies: How about metadata checklist and data publications? , 2014, Journal of proteome research.

[21]  V. Pastor,et al.  The 'prime-ome': towards a holistic approach to priming. , 2015, Trends in plant science.

[22]  L. Smarr Quantifying your body: a how-to guide from a systems biology perspective. , 2012, Biotechnology journal.

[23]  Igor Jurisica,et al.  Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions , 2014, BMC Bioinformatics.

[24]  Eugene Kolker,et al.  Metadata Checklist for the Integrated Personal Omics Study: Proteomics and Metabolomics Experiments , 2013, Big Data.

[25]  Eugene Kolker,et al.  Beyond protein expression, MOPED goes multi-omics , 2014, Nucleic Acids Res..

[26]  David Meyre,et al.  From big data analysis to personalized medicine for all: challenges and opportunities , 2015, BMC Medical Genomics.

[27]  Türkay Dereli,et al.  Ready to put metadata on the post-2015 development agenda? Linking data publications to responsible innovation and science diplomacy. , 2014, Omics : a journal of integrative biology.

[28]  S. Bustin,et al.  The reproducibility of biomedical research: Sleepers awake! , 2014, Biomolecular detection and quantification.

[29]  Nigel W. Hardy,et al.  Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop , 2010, Standards in genomic sciences.

[30]  Hugo Y. K. Lam,et al.  Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes , 2012, Cell.

[31]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[32]  John Wilbanks,et al.  'Omics Data Sharing , 2009, Science.

[33]  E. Niebergall,et al.  At the Table , 2008 .

[34]  Doron Lancet,et al.  MOPED enables discoveries through consistently processed proteomics data. , 2014, Journal of proteome research.