SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users.

[1]  N. Ahmed A Flood of Microbial Genomes–Do We Need More? , 2009, PloS one.

[2]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[3]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[4]  Obi L. Griffith,et al.  The Genome Sequence of the SARS-Associated Coronavirus , 2003, Science.

[5]  Robert Olson,et al.  Real Time Metagenomics: Using k-mers to annotate metagenomes , 2012, Bioinform..

[6]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[8]  Ross Overbeek,et al.  Genomics: what is realistically achievable? , 2000, Genome Biology.

[9]  Robert Olson,et al.  Accessing the SEED Genome Databases via Web Services API: Tools for Programmers , 2010, BMC Bioinformatics.

[10]  Daniela Bartels,et al.  Annotation of bacterial and archaeal genomes: improving accuracy and consistency. , 2007, Chemical reviews.

[11]  L. Finelli,et al.  Emergence of a novel swine-origin influenza A (H1N1) virus in humans. , 2009, The New England journal of medicine.

[12]  John Gould,et al.  Toward the automated generation of genome-scale metabolic networks in the SEED , 2007, BMC Bioinformatics.

[13]  R. Overbeek,et al.  FIGfams: yet another set of protein families , 2009, Nucleic acids research.

[14]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[15]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[16]  Matthew D. Jankowski,et al.  Group contribution method for thermodynamic analysis of complex metabolic networks. , 2008, Biophysical journal.

[17]  E. Brzuszkiewicz,et al.  Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: Entero-Aggregative-Haemorrhagic Escherichia coli (EAHEC) , 2011, Archives of Microbiology.

[18]  R. Mahadevan,et al.  The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. , 2003, Metabolic engineering.

[19]  James H. Bullard,et al.  Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. , 2011, The New England journal of medicine.

[20]  M. Stanhope,et al.  Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition , 2007, Genome Biology.

[21]  Rick L Stevens,et al.  iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations , 2009, Genome Biology.

[22]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[23]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[24]  Robert A. Edwards,et al.  PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies , 2012, Nucleic acids research.

[25]  S. Bentley,et al.  Sequencing the species pan-genome , 2009, Nature Reviews Microbiology.

[26]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[27]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[28]  M. Dolan,et al.  Beyond the HapMap Genotypic Data: Prospects of Deep Resequencing Projects. , 2008, Current bioinformatics.

[29]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[30]  G. Church,et al.  Genome-Scale Metabolic Model of Helicobacter pylori 26695 , 2002, Journal of bacteriology.

[31]  V. Nizet,et al.  Pathogen Microevolution in High Resolution , 2010, Science Translational Medicine.

[32]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[33]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..

[34]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[35]  I-Min A. Chen,et al.  The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata , 2011, Nucleic Acids Res..

[36]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[37]  Gavin J. D. Smith,et al.  Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic , 2009, Nature.

[38]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[39]  Rick L. Stevens,et al.  The SEED: a peer-to-peer environment for genome annotation , 2004, CACM.

[40]  G. Moreno-Hagelsieb,et al.  The pseudogenes of Mycobacterium leprae reveal the functional relevance of gene order within operons , 2010, Nucleic acids research.

[41]  Rick L. Stevens,et al.  Connecting genotype to phenotype in the era of high-throughput sequencing. , 2011, Biochimica et biophysica acta.

[42]  S. Daefler,et al.  Biolog phenotype microarrays. , 2012, Methods in molecular biology.

[43]  Peer Bork,et al.  KEGG Atlas mapping for global analysis of metabolic pathways , 2008, Nucleic Acids Res..