Automatic Annotation of Bacterial Community Sequences and Application to Infections Diagnostic

To annotate bacterial sequences from an environmental sample, we have developed an automatic annotation pipeline Fgenesb_annotator that includes self-training of gene-finding parameters, prediction of CDS, RNA genes, operons, promoters and terminators. New version of pipeline includes frame shift corrections and special module with improved prediction accuracy of ribosomal proteins. To analyze next-generation sequencing data we have developed OligiZip assembler and Transomics pipeline that provide solutions to the following tasks: 1) de novo reconstruction of genomic sequence; 2) reconstruction of sequence with a reference genome; 3) SNP discovery; 4) mapping RNA-Seq data to a reference genome, assemble them into alternative transcripts and quantify the abundance of these transcripts. Using the OligoZip assembler and gene Fgenesb pipeline we have developed a novel computational approach of identification toxic and nontoxic bacterial serotypes using next-generation sequencing data. It can be used for detection of bacterial infections in wounds, water or food contamination.

[1]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[2]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[3]  Mikhail S. Gelfand,et al.  Combining diverse evidence for gene recognition in completely sequenced bacterial genomes , 1998, German Conference on Bioinformatics.

[4]  T. Uchiyama,et al.  Reconstruction and Regulation of the Central Catabolic Pathway in the Thermophilic Propionate-Oxidizing Syntroph Pelotomaculum thermopropionicum , 2006, Journal of bacteriology.

[5]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[6]  Natalia Ivanova,et al.  Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities , 2006, Nature Biotechnology.

[7]  Markiyan Samborskyy,et al.  Complete genome sequence of the erythromycin-producing bacterium Saccharopolyspora erythraea NRRL23338 , 2007, Nature Biotechnology.

[8]  T. Hackstadt,et al.  A small RNA inhibits translation of the histone‐like protein Hc1 in Chlamydia trachomatis , 2006, Molecular microbiology.

[9]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[10]  G. Olsen,et al.  CRITICA: coding region identification tool invoking comparative analysis. , 1999, Molecular biology and evolution.

[11]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[12]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[13]  Folker Meyer,et al.  GISMO—gene identification using a support vector machine for ORF classification , 2006, Nucleic acids research.

[14]  K. Schleifer,et al.  Characterization of bacterial operons consisting of two tubulins and a kinesin-like gene by the novel Two-Step Gene Walking method , 2007, Nucleic Acids Research.

[15]  E. Delong,et al.  Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host , 2007, Proceedings of the National Academy of Sciences.

[16]  M. S. McClain,et al.  Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer , 2009, BMC Genomics.

[17]  Julia Krushkal,et al.  Computational prediction of conserved operons and phylogenetic footprinting of transcription regulatory elements in the metal-reducing bacterial family Geobacteraceae. , 2004, Journal of theoretical biology.

[18]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[19]  Victor M Markowitz,et al.  Microbial genome data resources. , 2007, Current opinion in biotechnology.

[20]  J. Banfield,et al.  Genome-Directed Isolation of the Key Nitrogen Fixer Leptospirillum ferrodiazotrophum sp. nov. from an Acidophilic Microbial Community , 2005, Applied and Environmental Microbiology.

[21]  Anders Krogh,et al.  EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance , 2003, BMC Bioinformatics.

[22]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[23]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[24]  M. Waldor,et al.  Characterization of a higBA Toxin-Antitoxin Locus in Vibrio cholerae , 2006, Journal of bacteriology.

[25]  Abraham Esteve-Núñez,et al.  Computational prediction of RpoS and RpoD regulatory sites in Geobacter sulfurreducens using sequence and gene expression information. , 2006, Gene.

[26]  E. Delong,et al.  Proteorhodopsin lateral gene transfer between marine planktonic Bacteria and Archaea , 2006, Nature.

[27]  Andrés Moya,et al.  A Small Microbial Genome: The End of a Long Symbiotic Relationship? , 2006, Science.

[28]  Martha B. Furie,et al.  Deletion of TolC orthologs in Francisella tularensis identifies roles in multidrug resistance and virulence , 2006, Proceedings of the National Academy of Sciences.

[29]  C. Prigent-Combaret,et al.  Promoter-trap identification of wheat seed extract-induced genes in the plant-growth-promoting rhizobacterium Azospirillum brasilense Sp245. , 2007, Microbiology.

[30]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[31]  M. Borodovsky,et al.  How to interpret an anonymous bacterial genome: machine learning approach to gene identification. , 1998, Genome research.

[32]  A. Filloux,et al.  XphA/XqhA, a Novel GspCD Subunit for Type II Secretion in Pseudomonas aeruginosa , 2007, Journal of bacteriology.