The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

[1]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[2]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[3]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[4]  L. Schouls,et al.  Identification of genes that are associated with DNA repeats in prokaryotes , 2002, Molecular microbiology.

[5]  Fangfang Xia,et al.  The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation , 2006, Nucleic Acids Res..

[6]  Kathi Canese,et al.  PubMed: The Bibliographic Database , 2013 .

[7]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[8]  Robert Olson,et al.  Real Time Metagenomics: Using k-mers to annotate metagenomes , 2012, Bioinform..

[9]  H. Alper Systems metabolic engineering : methods and protocols , 2013 .

[10]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[11]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Robert Olson,et al.  Accessing the SEED Genome Databases via Web Services API: Tools for Programmers , 2010, BMC Bioinformatics.

[13]  Bhanu K. Kamapantula,et al.  PANNOTATOR: an automated tool for annotation of pan-genomes. , 2013, Genetics and molecular research : GMR.

[14]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[15]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[16]  Narmada Thanki,et al.  CDD: conserved domains and protein three-dimensional structure , 2012, Nucleic Acids Res..

[17]  R. Overbeek,et al.  FIGfams: yet another set of protein families , 2009, Nucleic acids research.

[18]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[19]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[20]  John Walker,et al.  A highly conserved repeated DNA element located in the chromosome of Streptococcus pneumoniae , 1992, Nucleic Acids Res..

[21]  R. Overbeek,et al.  Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED. , 2013, Methods in molecular biology.

[22]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[23]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[24]  Michael Y. Galperin,et al.  Sequence ― Evolution ― Function: Computational Approaches in Comparative Genomics , 2010 .

[25]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[26]  Fangfang Xia,et al.  In search of genome annotation consistency: solid gene clusters and how to use them , 2013, 3 Biotech.

[27]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[28]  Matthew DeJongh,et al.  Evaluating the consistency of gene sets used in the analysis of bacterial gene expression data , 2011, BMC Bioinformatics.

[29]  Joseph L. Gabbard,et al.  PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species , 2011, Infection and Immunity.

[30]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[31]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[32]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[33]  Michael Y. Galperin,et al.  Genome Annotation and Analysis , 2003 .

[34]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[35]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[36]  Peter D. Karp,et al.  EcoCyc: fusing model organism databases with systems biology , 2012, Nucleic Acids Res..

[37]  Monica Riley,et al.  Escherichia coli K-12: a cooperatively developed annotation snapshot—2005 , 2006, Nucleic acids research.