A standard MIGS/MIMS compliant XML Schema: toward the development of the Genomic Contextual Data Markup Language (GCDML).

The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).

[1]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[2]  George M. Garrity,et al.  Meeting report for SIGS1: First Conference of the Standards in Genomic Sciences eJournal , 2009, Standards in genomic sciences.

[3]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[4]  Rolf Apweiler,et al.  Genome Reviews: standardizing content and representation of information about complete genomes. , 2006, Omics : a journal of integrative biology.

[5]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[6]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[7]  P. Bork,et al.  Get the most out of your metagenome: computational analysis of environmental sequence data. , 2007, Current opinion in microbiology.

[8]  Jan Krüger,et al.  Playing with pesticides. , 1998, BMC Bioinformatics.

[9]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[10]  Peter F. Hallin,et al.  Ten years of bacterial genome sequencing: comparative-genomics-based discoveries , 2006, Functional & Integrative Genomics.

[11]  Renzo Kottmann,et al.  Meeting report: the fifth Genomic Standards Consortium (GSC) workshop. , 2008, Omics : a journal of integrative biology.

[12]  I-Min A. Chen,et al.  The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions , 2007, Nucleic Acids Res..

[13]  Renzo Kottmann,et al.  Megx.net: integrated database resource for marine ecological genomics , 2009, Nucleic Acids Res..

[14]  Meeting Report: “Metagenomics, Metadata and Meta-analysis” (M3) Special Interest Group at ISMB 2009 , 2009, Standards in genomic sciences.

[15]  Nigel W. Hardy,et al.  The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics , 2007, Nature Biotechnology.

[16]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[17]  David S. Burggraf Geography Markup Language , 2006, Data Sci. J..

[18]  Renzo Kottmann,et al.  Meeting Report from the Genomic Standards Consortium (GSC) Workshops 6 and 7 , 2009, Standards in genomic sciences.

[19]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[20]  Renzo Kottmann,et al.  Megx.net—database resources for marine ecological genomics , 2005, Nucleic Acids Res..

[21]  Guy Cochrane,et al.  Concept of sample in OMICS technology. , 2006, Omics : a journal of integrative biology.

[22]  Eric van der Vlist,et al.  XML Schema , 2002 .

[23]  Renzo Kottmann,et al.  Defining linkages between the GSC and NSF's LTER program: how the Ecological Metadata Language (EML) relates to GCDML and other outcomes. , 2008, Omics : a journal of integrative biology.

[24]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[25]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[26]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific , 2007, PLoS biology.

[27]  Tomas Vitvar,et al.  SAWSDL: Semantic Annotations for WSDL and XML Schema , 2007, IEEE Internet Computing.

[28]  Renzo Kottmann,et al.  Habitat-Lite: a GSC case study based on free text terms for environmental metadata. , 2008, Omics : a journal of integrative biology.

[29]  Dawn Field,et al.  Cataloguing our current genome collection. , 2005, Microbiology.

[30]  Chris F. Taylor,et al.  Development of FuGO: an ontology for functional genomics investigations. , 2006, Omics : a journal of integrative biology.

[31]  Nigel W. Hardy,et al.  The first RSBI (ISA-TAB) workshop: "can a simple format work for complex studies?". , 2008, Omics : a journal of integrative biology.

[32]  James R. Cole,et al.  The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data , 2006, Nucleic Acids Res..

[33]  Ethan Cerami,et al.  XML for bioinformatics , 2005 .

[34]  Daniel Hanisch,et al.  ProML - the Protein Markup Language for specification of protein sequences, structures and families , 2002, Silico Biol..

[35]  John C. Wooley,et al.  Extending Standards for Genomics and Metagenomics Data: A Research Coordination Network for the Genomic Standards Consortium (RCN4GSC) , 2009, Standards in genomic sciences.