REGANOR: a gene prediction server for prokaryotic genomes and a database of high quality gene predictions for prokaryotes.

UNLABELLED With >1,000 prokaryotic genome sequencing projects ongoing or already finished, comprehensive comparative analysis of the gene content of these genomes has become viable. To allow for a meaningful comparative analysis, gene prediction of the various genomes should be as accurate as possible. It is clear that improving the state of genome annotation requires automated gene identification methods to cope with the influence of artifacts, such as genomic GC content. There is currently still room for improvement in the state of annotations. We present a web server and a database of high-quality gene predictions. The web server is a resource for gene identification in prokaryote genome sequences. It implements our previously described, accurate gene finding method REGANOR. We also provide novel gene predictions for 241 complete, or almost complete, prokaryotic genomes. We demonstrate how this resource can easily be utilised to identify promising candidates for currently missing genes from genome annotations with several examples. All data sets are available online. AVAILABILITY The gene finding server is accessible via https://www.cebitec.uni-bielefeld.de/groups/brf/software/reganor/cgi-bin/reganor_upload.cgi. The server software is available with the GenDB genome annotation system (version 2.2.1 onwards) under the GNU general public license. The software can be downloaded from https://sourceforge.net/projects/gendb/. More information on installing GenDB and REGANOR and the system requirements can be found on the GenDB project page http://www.cebitec.uni-bielefeld.de/groups/brf/software/wiki/GenDBWiki/AdministratorDocumentation/GenDBInstallation

[1]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[2]  Mikhail S. Gelfand,et al.  Combining diverse evidence for gene recognition in completely sequenced bacterial genomes , 1998, German Conference on Bioinformatics.

[3]  G. Olsen,et al.  CRITICA: coding region identification tool invoking comparative analysis. , 1999, Molecular biology and evolution.

[4]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[5]  C. Ouzounis,et al.  Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools. , 2000, Nucleic acids research.

[6]  Yan P. Yuan,et al.  Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. , 2000, Nucleic acids research.

[7]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[8]  Anders Krogh,et al.  EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance , 2003, BMC Bioinformatics.

[9]  I. Rigoutsos,et al.  Dictionary-driven prokaryotic gene finding. , 2002, Nucleic acids research.

[10]  R. Giegerich,et al.  GenDB--an open source genome annotation system for prokaryote genomes. , 2003, Nucleic acids research.

[11]  James O. McInerney,et al.  Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models , 2004, BMC Bioinformatics.

[12]  Grégory Nuel,et al.  AMIGene: Annotation of MIcrobial Genes , 2003, Nucleic Acids Res..

[13]  S. Salzberg,et al.  The genome sequence of Bacillus anthracis Ames and comparison to closely related bacteria , 2003, Nature.

[14]  R. Overbeek,et al.  Missing genes in metabolic pathways: a comparative genomics approach. , 2003, Current opinion in chemical biology.

[15]  T. Tatusova,et al.  Reannotation of Shewanella oneidensis genome. , 2003, Omics : a journal of integrative biology.

[16]  Feng-Biao Guo,et al.  ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes. , 2003, Nucleic acids research.

[17]  Nikos Kyrpides,et al.  Genome sequence of Bacillus cereus and comparative analysis with Bacillus anthracis , 2003, Nature.

[18]  Nemat O. Keyhani,et al.  Ancient Origin of the Tryptophan Operon and the Dynamics of Evolutionary Change , 2003, Microbiology and Molecular Biology Reviews.

[19]  Folker Meyer,et al.  Development of joint application strategies for two microbial gene finders , 2004, Bioinform..

[20]  Rick L. Stevens,et al.  The SEED: a peer-to-peer environment for genome annotation , 2004, CACM.

[21]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[22]  Gordon A Anderson,et al.  Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.