A Probabilistic Learning Approach to

We present a computational approach to predicting operons in the genomes of prokaryotic organisms. Our approach uses machine learning methods to induce predictive models for this task from a rich variety of data types including sequence data, gene expression data, and functional annotations associated with genes. We use multiple learned models that individually predict promoters, terminators and operons themselves. A key part of our approach is a dynamic programming method that uses our predictions to map every known and putative gene in a given genome into its most probable operon. We evaluate our approach using data from the E. coli K-12 genome.

[1]  Alen D. Shapiro,et al.  Structured induction in expert systems , 1987 .

[2]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[3]  Peter Stone,et al.  Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.

[4]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[5]  F. Neidhart Escherichia coli and Salmonella. , 1996 .

[6]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[7]  Bruce G. Buchanan,et al.  Learning Intermediate Concepts in Constructing a Hierarchical Knowledge Base , 1985, IJCAI.

[8]  David Page,et al.  Using Multiple Levels of Learning and Diverse Evidence to Uncover Coordinately Controlled Genes , 2000, ICML.

[9]  Julio Collado-Vides,et al.  RegulonDB (version 3.0): transcriptional regulation and operon organization in Escherichia coli K-12 , 2000, Nucleic Acids Res..

[10]  Ying Xu,et al.  Constructing gene models from accurately predicted exons: an application of dynamic programming , 1994, Comput. Appl. Biosci..

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[13]  Joseph O'Sullivan,et al.  TRANSFERRING LEARNED KNOWLEDGE IN A LIFELONG LEARNING MOBILE ROBOT AGENT , 2000 .

[14]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[15]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[16]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.