Annotation, comparison and databases for hundreds of bacterial genomes.

The multitude of bacterial genome sequences being determined has opened up a new field of research, that of comparative genomics. One role of bioinformatics is to assist biologists in the extraction of biological knowledge from this data flood. Software designed for the analysis and functional annotation of a single genome have, in consequence, evolved towards comparative genomics tools, bringing together the information contained in numerous genomes simultaneously. This paper reviews advances in the development of bacterial annotation and comparative analysis tools, and progress in the design of novel database structures for the integration of heterogeneous biological information.

[1]  Duane Szafron,et al.  BASys: a web server for automated bacterial genome annotation , 2005, Nucleic Acids Res..

[2]  Ana Tereza Ribeiro de Vasconcelos,et al.  A System for Automated Bacterial (genome) Integrated Annotation - SABIA , 2004, Bioinform..

[3]  Michael Y. Galperin,et al.  Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption , 1998, Silico Biol..

[4]  C. Francke,et al.  Reconstructing the metabolic network of a bacterium from its genome. , 2005, Trends in microbiology.

[5]  Philippe Glaser,et al.  DiffTool: building, visualizing and querying protein clusters , 2002, Bioinform..

[6]  Members of the Complex Trait Consortium,et al.  Standardizing global gene expression analysis between laboratories and across platforms , 2005 .

[7]  Lincoln Stein,et al.  Genome annotation: from sequence to biology , 2001, Nature Reviews Genetics.

[8]  N. P. Brown,et al.  The GeneQuiz web server: protein functional analysis through the Web. , 2000, Trends in biochemical sciences.

[9]  Morris A. Swertz,et al.  Beyond standardization: dynamic software infrastructures for systems biology , 2007, Nature Reviews Genetics.

[10]  Dmitrij Frishman,et al.  Functional and structural genomics using PEDANT , 2001, Bioinform..

[11]  J. Nielsen,et al.  From genomes to in silico cells via metabolic networks. , 2005, Current opinion in biotechnology.

[12]  Dmitrij Frishman,et al.  Deciphering the evolution and metabolism of an anammox bacterium from a community genome , 2006, Nature.

[13]  Ren Zhang,et al.  A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I , 2004, Bioinform..

[14]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[15]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[16]  Korine S. E. Ung,et al.  Evidence of a Large Novel Gene Pool Associated with Prokaryotic Genomic Islands , 2005, PLoS genetics.

[17]  Anne-Lise Veuthey,et al.  Automated annotation of microbial proteomes in SWISS-PROT , 2003, Comput. Biol. Chem..

[18]  Antje Chang,et al.  BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009 , 2008, Nucleic Acids Res..

[19]  C. Fraser-Liggett,et al.  Insights on biology and evolution from microbial genome sequencing. , 2005, Genome research.

[20]  Kathleen Marchal,et al.  Integration of omics data: how well does it work for bacteria? , 2006, Molecular microbiology.

[21]  Claudine Médigue,et al.  Identification of the Last Unknown Genes in the Fermentation Pathway of Lysine* , 2007, Journal of Biological Chemistry.

[22]  Owen White,et al.  Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics , 2005, Bioinform..

[23]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes , 2005, Nucleic Acids Res..

[24]  Gertraud Burger,et al.  AutoFACT: An Automatic Functional Annotation and Classification Tool , 2005, BMC Bioinformatics.

[25]  Dieter Haas,et al.  A guide to small RNAs in microorganisms , 2007 .

[26]  S. Salzberg Genome re-annotation: a wiki solution? , 2007, Genome Biology.

[27]  R. Giegerich,et al.  GenDB--an open source genome annotation system for prokaryote genomes. , 2003, Nucleic acids research.

[28]  J. Weissenbach,et al.  A Tale of Two Oxidation States: Bacterial Colonization of Arsenic-Rich Environments , 2007, PLoS genetics.

[29]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[30]  Inna Dubchak,et al.  The integrated microbial genomes (IMG) system , 2005, Nucleic Acids Res..

[31]  Qiong Gao,et al.  Resources for integrative systems biology: from data through databases to networks and dynamic system models , 2006, Briefings Bioinform..

[32]  Georgios S. Vernikos,et al.  Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands , 2006, Bioinform..

[33]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[34]  L. Stein Integrating biological databases , 2003, Nature Reviews Genetics.

[35]  C. Buchrieser,et al.  How to become a uropathogen: Comparative genomic analysis of extraintestinal pathogenic Escherichia coli strains , 2006, Proceedings of the National Academy of Sciences.

[36]  Natalia Ivanova,et al.  The ERGOTM genome analysis and discovery system , 2003, Nucleic Acids Res..

[37]  Mark D'Souza,et al.  Use of contiguity on the chromosome to predict functional coupling , 1998, Silico Biol..

[38]  Monica Riley,et al.  Escherichia coli K-12: a cooperatively developed annotation snapshot—2005 , 2006, Nucleic acids research.

[39]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[40]  C. Médigue,et al.  MaGe: a microbial genome annotation system supported by synteny results , 2006, Nucleic acids research.

[41]  Matthew Berriman,et al.  Viewing and Annotating Sequence Data with Artemis , 2003, Briefings Bioinform..

[42]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Ingmar Reuter,et al.  Integr8 and Genome Reviews: integrated views of complete genomes and proteomes , 2004, Nucleic Acids Res..

[44]  Owen White,et al.  The Comprehensive Microbial Resource , 2001, Nucleic Acids Res..

[45]  Jon R. Armstrong,et al.  Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Patrick Lambrix,et al.  A review of standards for data exchange within systems biology , 2007, Proteomics.

[47]  Dawn Field,et al.  How do we compare hundreds of bacterial genomes? , 2006, Current opinion in microbiology.

[48]  Manuel C Peitsch,et al.  From proteomics to systems biology of bacterial pathogens: Approaches, tools, and applications , 2007, Proteomics.

[49]  Antje Chang,et al.  BRENDA, AMENDA and FRENDA: the enzyme information system in 2007 , 2007, Nucleic Acids Res..

[50]  M. Vidal,et al.  Integrating 'omic' information: a bridge between genomics and systems biology. , 2003, Trends in genetics : TIG.

[51]  Paul Stothard,et al.  Automated bacterial genome analysis and annotation. , 2006, Current opinion in microbiology.

[52]  B. Palsson,et al.  Towards multidimensional genome annotation , 2006, Nature Reviews Genetics.

[53]  Antoine Danchin,et al.  SubtiList: the reference database for the Bacillus subtilis genome , 2002, Nucleic Acids Res..

[54]  Alain Giron,et al.  Detection and characterization of horizontal transfers in prokaryotes using genomic signature , 2005, Nucleic acids research.

[55]  Markus J. Herrgård,et al.  Integrating high-throughput and computational data elucidates bacterial networks , 2004, Nature.

[56]  A. Danchin,et al.  Conserved genes in a path from commensalism to pathogenicity: comparative phylogenetic profiles of Staphylococcus epidermidis RP62A and ATCC12228 , 2006, BMC Genomics.

[57]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[58]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[59]  J. Townsend,et al.  Horizontal gene transfer, genome innovation and evolution , 2005, Nature Reviews Microbiology.

[60]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[61]  P. Hugenholtz Exploring prokaryotic diversity in the genomic era , 2002, Genome Biology.

[62]  Rekha Seshadri,et al.  Bacterial Genomics and Pathogen Evolution , 2006, Cell.

[63]  Igor Goryanin,et al.  EchoBASE: an integrated post-genomic database for Escherichia coli , 2004, Nucleic Acids Res..

[64]  Meriem El Karoui,et al.  Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops , 2005, BMC Bioinformatics.

[65]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[66]  S. Raisamo,et al.  From , 2020, The Solace Is Not the Lullaby.

[67]  Dawn Field,et al.  Databases and software for the comparison of prokaryotic genomes. , 2005, Microbiology.

[68]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[69]  Peer Bork,et al.  Protein function space: viewing the limits or limited by our view? , 2007, Current opinion in structural biology.

[70]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[71]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[72]  K. Bryson,et al.  AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system , 2006, Nucleic acids research.

[73]  Dieter Jahn,et al.  SYSTOMONAS — an integrated database for systems biology analysis of Pseudomonas , 2007, Nucleic Acids Res..

[74]  T Gaasterland,et al.  Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. , 1996, Biochimie.

[75]  Jean-Michel Claverie,et al.  Phydbac "Gene Function Predictor" : a gene annotation tool based on genomic context analysis , 2005, BMC Bioinformatics.

[76]  Ross A. Overbeek,et al.  Automatic detection of subsystem/pathway variants in genome analysis , 2005, ISMB.

[77]  Peter F. Hallin,et al.  Ten years of bacterial genome sequencing: comparative-genomics-based discoveries , 2006, Functional & Integrative Genomics.

[78]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..