Data standards for Omics data: the basis of data sharing and reuse.

To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.

[1]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[2]  Nigel W. Hardy,et al.  The first RSBI (ISA-TAB) workshop: "can a simple format work for complex studies?". , 2008, Omics : a journal of integrative biology.

[3]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..

[4]  A. Brazma,et al.  Standards for systems biology , 2006, Nature Reviews Genetics.

[5]  P. H. Abbrecht,et al.  A simple algorithm for averaging multiple cycles of waveforms having varying cycle periods , 1991, Comput. Appl. Biosci..

[6]  Nigel W. Hardy,et al.  A roadmap for the establishment of standard data exchange structures for metabolomics , 2007, Metabolomics.

[7]  Gavin Sherlock,et al.  The Stanford Microarray Database accommodates additional microarray platforms and data formats , 2004, Nucleic Acids Res..

[8]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[9]  Paul T. Spellman,et al.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB , 2006, BMC Bioinformatics.

[10]  Lincoln D. Stein,et al.  Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges , 2008, Nature Reviews Genetics.

[11]  Wendy W. Chapman,et al.  Identifying Data Sharing in Biomedical Literature , 2008, AMIA.

[12]  Nigel W. Hardy,et al.  The Metabolomics Standards Initiative , 2007, Nature Biotechnology.

[13]  Christopher G Chute,et al.  National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. , 2006, Omics : a journal of integrative biology.

[14]  Ron Edgar,et al.  Gene Expression Omnibus ( GEO ) : Microarray data storage , submission , retrieval , and analysis , 2008 .

[15]  Heather A. Piwowar,et al.  Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers , 2008, PLoS medicine.

[16]  S. Nelson,et al.  Celsius: a community resource for Affymetrix microarray data , 2007, Genome Biology.

[17]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[18]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[19]  Gavin Sherlock,et al.  The Stanford Microarray Database: a user's guide. , 2006, Methods in molecular biology.

[20]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[21]  Nigel W. Hardy,et al.  Establishing reporting standards for metabolomic and metabonomic studies: a call for participation. , 2006, Omics : a journal of integrative biology.

[22]  Lennart Martens,et al.  The minimum information about a proteomics experiment (MIAPE) , 2007, Nature Biotechnology.

[23]  Jason E. Stewart,et al.  Design and implementation of microarray gene expression markup language (MAGE-ML) , 2002, Genome Biology.

[24]  Gregory R. Grant,et al.  RAD and the RAD Study-Annotator: an approach to collection, organization and exchange of all relevant information for high-throughput gene expression studies , 2004, Bioinform..

[25]  Laura DeFrancesco Journal trio embraces MIAME , 2002, Genome Biology.

[26]  Chris F. Taylor,et al.  A common open representation of mass spectrometry data and its application to proteomics research , 2004, Nature Biotechnology.

[27]  Nigel W. Hardy,et al.  Toward Supportive Data Collection Tools for Plant Metabolomics[w] , 2005, Plant Physiology.

[28]  Henning Hermjakob,et al.  The HUPO proteomics standards initiative - easing communication and minimizing data loss in a changing world , 2007, Briefings Bioinform..

[29]  Nigel W. Hardy,et al.  The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics , 2007, Nature Biotechnology.

[30]  Norman W. Paton,et al.  An analysis of extensible modelling for functional genomics data , 2005, BMC Bioinformatics.

[31]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[32]  Malorye Allison,et al.  Is personalized medicine finally arriving? , 2008, Nature Biotechnology.

[33]  Henry J. Lowe,et al.  A proposed key escrow system for secure patient information disclosure in biomedical research databases , 2002, AMIA.

[34]  Douglas B. Kell,et al.  Proposed minimum reporting standards for data analysis in metabolomics , 2007, Metabolomics.

[35]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[36]  C. Ball,et al.  Microarray databases: standards and ontologies , 2002, Nature Genetics.

[37]  Nigel W. Hardy,et al.  MeMo: a hybrid SQL/XML approach to metabolomic data management for functional genomics , 2006, BMC Bioinformatics.

[38]  David Botstein,et al.  The Stanford Microarray Database: data access and quality assessment tools , 2003, Nucleic Acids Res..

[39]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[40]  John Quackenbush,et al.  Data standards: a call to action. , 2006, Omics : a journal of integrative biology.

[41]  Eric M. Meslin,et al.  Shifting paradigms in health services research ethics consent, privacy, and the challenges for IRBs , 2006, Journal of General Internal Medicine.

[42]  Christian J Stoeckert,et al.  Much room for improvement in deposition rates of expression microarray datasets , 2008, Nature Methods.

[43]  Robert Gentleman,et al.  Top-down standards will not serve systems biology , 2006, Nature.

[44]  Chris F. Taylor,et al.  Development of FuGO: an ontology for functional genomics investigations. , 2006, Omics : a journal of integrative biology.

[45]  Andrew R Jones,et al.  A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. , 2006, Omics : a journal of integrative biology.

[46]  Toshio Kojima,et al.  The phenotype and genotype experiment object model (PaGE‐OM): a robust data structure for information related to DNA variation , 2009, Human mutation.

[47]  Faisal Ibne Rezwan,et al.  MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB , 2009, Bioinform..

[48]  Thomas S Deisboeck,et al.  Life Sciences and the web: a new era for collaboration , 2008, Molecular systems biology.

[49]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[50]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[51]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[52]  Gavin Sherlock,et al.  The Stanford Microarray Database: implementation of new analysis tools and open source release of software , 2002, Nucleic Acids Res..

[53]  Alberto Cambrosio,et al.  Making a New Technology Work: The Standardization and Regulation of Microarrays , 2007, The Yale journal of biology and medicine.

[54]  Helen Parkinson,et al.  ArrayExpress service for reviewers/editors of DNA microarray papers , 2006, Nature Biotechnology.

[55]  M. Boguski,et al.  Biosequence exegesis : Genome , 1999 .

[56]  Chris F. Taylor,et al.  Metabolomics standards initiative: ontology working group work in progress , 2007, Metabolomics.

[57]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[58]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[59]  M S Boguski,et al.  Biosequence exegesis. , 1999, Science.

[60]  Jennifer M Fostel,et al.  Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems). , 2008, Toxicology and applied pharmacology.

[61]  Nicola Cooley,et al.  MiMiR: a comprehensive solution for storage, annotation and exchange of microarray data , 2005, BMC Bioinformatics.

[62]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[63]  Gavin Sherlock,et al.  Implementation of GenePattern within the Stanford Microarray Database , 2008, Nucleic Acids Res..

[64]  A. Rector,et al.  Relations in biomedical ontologies , 2005, Genome Biology.

[65]  Dhavendra Kumar,et al.  From evidence-based medicine to genomic medicine , 2007, Genomic Medicine.

[66]  Chris F. Taylor,et al.  The MGED Ontology: a resource for semantics-based description of microarray experiments , 2006, Bioinform..

[67]  Charles R Meyer,et al.  A web-based interface for communication of data between the clinical and research environments without revealing identifying information. , 2007, Academic radiology.

[68]  Marc Salit,et al.  Standards in gene expression microarray experiments. , 2006, Methods in enzymology.

[69]  Nigel W. Hardy,et al.  A proposed framework for the description of plant metabolomics experiments and their results , 2004, Nature Biotechnology.

[70]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[71]  Kei-Hoi Cheung,et al.  Advancing translational research with the Semantic Web , 2007, BMC Bioinformatics.

[72]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[73]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[74]  Peter Woollard,et al.  The minimum information required for reporting a molecular interaction experiment (MIMIx) , 2007, Nature Biotechnology.

[75]  Alvis Brazma,et al.  On the Importance of Standardisation in Life Sciences , 2001, Bioinform..