A Bayesian Network Approach to Operon Prediction

MOTIVATION In order to understand transcription regulation in a given prokaryotic genome, it is critical to identify operons, the fundamental units of transcription, in such species. While there are a growing number of organisms whose sequence and gene coordinates are known, by and large their operons are not known. RESULTS We present a probabilistic approach to predicting operons using Bayesian networks. Our approach exploits diverse evidence sources such as sequence and expression data. We evaluate our approach on the Escherichia coli K-12 genome where our results indicate we are able to identify over 78% of its operons at a 10% false positive rate. Also, empirical evaluation using a reduced set of data sources suggests that our approach may have significant value for organisms that do not have as rich of evidence sources as E.coli. AVAILABILITY Our E.coli K-12 operon predictions are available at http://www.biostat.wisc.edu/gene-regulation.

[1]  J. Szustakowski,et al.  Computational identification of operons in microbial genomes. , 2002, Genome research.

[2]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[3]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  David Page,et al.  A Probabilistic Learning Approach to Whole-Genome Operon Prediction , 2000, ISMB.

[7]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[8]  T Yada,et al.  Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. , 1999, Bioinformatics.

[9]  Mark Craven,et al.  Refining the Structure of a Stochastic Context-Free Grammar , 2001, IJCAI.

[10]  S Karlin,et al.  Codon usages in different gene classes of the Escherichia coli genome , 1998, Molecular microbiology.

[11]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Julio Collado-Vides,et al.  RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12 , 2001, Nucleic Acids Res..

[14]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[15]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[16]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[18]  Julio Collado-Vides,et al.  RegulonDB (version 3.0): transcriptional regulation and operon organization in Escherichia coli K-12 , 2000, Nucleic Acids Res..

[19]  David R. Haynor,et al.  Identifying operons and untranslated regions of transcripts using Escherichia coli RNA expression analysis , 2002, ISMB.

[20]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[21]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.