Computational prediction of operons in Synechococcus sp. WH8102.

We computationally predict operons in the Synechococcus sp. WH8102 genome based on three types of genomic data: intergenic distances, COG gene functions and phylogenetic profiles. In the proposed method, we first estimate a log-likelihood distribution for each type of genomic data, and then fuse these distribution information by a perceptron to discriminate pairs of genes within operons (WO pairs) from those across transcription unit borders (TUB pairs). Computational experiments demonstrated that WO pairs tend to have shorter intergenic distances, a higher probability being in the same COG functional categories and more similar phylogenetic profiles than TUB pairs, indicating their powerful capabilities for operon prediction. By testing the method on 236 known operons of Escherichia coli K12, an overall accuracy of 83.8% is obtained by joint learning from multiple types of genomic data, whereas individual information source yields accuracies of 80.4%, 74.4%, and 70.6% respectively. We have applied this new approach, in conjunction with our previous comparative genome analysis-based approach, to predict 556 (putative) operons in WH8102. All predicted data are available at (http://www.cs.ucr.edu/~xin/operons.htm) for public use.

[1]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Andrew C. Tolonen,et al.  The genome of a motile marine Synechococcus , 2003, Nature.

[3]  Arne Elofsson,et al.  The Use of Phylogenetic Profiles for Gene Predictions , 2002 .

[4]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Jean-Philippe Vert,et al.  A tree kernel to analyse phylogenetic profiles , 2002, ISMB.

[6]  Monica Riley,et al.  Escherichia coli gene products: Physiological functions and common ancestries , 1996 .

[7]  J. Szustakowski,et al.  Computational identification of operons in microbial genomes. , 2002, Genome research.

[8]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[9]  Chiara Sabatti,et al.  Co-expression pattern from DNA microarray experiments as a tool for operon prediction , 2002, Nucleic Acids Res..

[10]  Dan S. Prestridge,et al.  SIGNAL SCAN: a computer program that scans DNA sequences for eukaryotic transcriptional elements , 1991, Comput. Appl. Biosci..

[11]  Jean-Philippe Vert A tree kernel to analyze phylog enetic profi les , 2002 .

[12]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[13]  Jizhong Zhou Microbial Functional Genomics , 2004 .

[14]  Hengliang Wang,et al.  [Microbial functional genomics]. , 2003, Wei sheng wu xue bao = Acta microbiologica Sinica.

[15]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[16]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[17]  Tao Jiang,et al.  Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. , 2004, Nucleic acids research.

[18]  S. Salzberg,et al.  Prediction of transcription terminators in bacterial genomes. , 2000, Journal of molecular biology.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Warren C. Lathe,et al.  Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. , 2000, Genome research.

[21]  Satoru Miyano,et al.  Predicting the Operon Structure of Bacillus subtilis Using Operon Length, Intergene Distance, and Gene Expression Information , 2003, Pacific Symposium on Biocomputing.

[22]  T Yada,et al.  Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. , 1999, Bioinformatics.

[23]  E. Marcotte,et al.  Computational genetics: finding protein function by nonhomology methods. , 2000, Current opinion in structural biology.

[24]  D. Eisenberg,et al.  A combined algorithm for genome-wide prediction of protein function , 1999, Nature.

[25]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[26]  David Page,et al.  A Probabilistic Learning Approach to Whole-Genome Operon Prediction , 2000, ISMB.

[27]  Julio Collado-Vides,et al.  RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12 , 2001, Nucleic Acids Res..