A multi-approaches-guided genetic algorithm with application to operon prediction

OBJECTIVE The prediction of operons is critical to the reconstruction of regulatory networks at the whole genome level. Multiple genome features have been used for predicting operons. However, multiple genome features are usually dealt with using only single method in the literatures. The aim of this paper is to develop a combined method for operon prediction by using different methods to preprocess different genome features in order for exerting their unique characteristics. METHODS A novel multi-approach-guided genetic algorithm for operon prediction is presented. We exploit different methods for intergenic distance, cluster of orthologous groups (COG) gene functions, metabolic pathway and microarray expression data. A novel local-entropy-minimization method is proposed to partition intergenic distance. Our program can be used for other newly sequenced genomes by transferring the knowledge that has been obtained from Escherichia coli data. We calculate the log-likelihood for COG gene functions and Pearson correlation coefficient for microarray expression data. The genetic algorithm is used for integrating the four types of data. RESULTS The proposed method is examined on E. coli K12 genome, Bacillus subtilis genome, and Pseudomonas aeruginosa PAO1 genome. The accuracies of prediction for these three genomes are 85.9987%, 88.296%, and 81.2384%, respectively. CONCLUSION Simulated experimental results demonstrate that in the genetic algorithm the preprocessing for genome data using multiple approaches ensures the effective utilization of different biological characteristics. Experimental results also show that the proposed method is applicable for predicting operons in prokaryote.

[1]  Susumu Goto,et al.  ODB: a database of operons accumulating known operons across multiple genomes , 2005, Nucleic Acids Res..

[2]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[3]  J. Szustakowski,et al.  Computational identification of operons in microbial genomes. , 2002, Genome research.

[4]  Xin Chen,et al.  Computational prediction of operons in Synechococcus sp. WH8102. , 2004, Genome informatics. International Conference on Genome Informatics.

[5]  David Page,et al.  A Bayesian Network Approach to Operon Prediction , 2003, Bioinform..

[6]  J. Trawick,et al.  Genome-wide operon prediction in Staphylococcus aureus. , 2004, Nucleic acids research.

[7]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[8]  R. Russell,et al.  Structural systems biology: modelling protein interactions , 2006, Nature Reviews Molecular Cell Biology.

[9]  S. C. Rison,et al.  A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context , 2005, Nucleic acids research.

[10]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  A. Hughes,et al.  Pattern and timing of gene duplication in animal genomes. , 2001, Genome research.

[12]  Jon Beckwith,et al.  A novel regulatory mechanism couples deoxyribonucleotide synthesis and DNA replication in Escherichia coli , 2006, The EMBO journal.

[13]  Jeremy Buhler,et al.  Operon prediction without a training set , 2005, Bioinform..

[14]  David Page,et al.  A Probabilistic Learning Approach to Whole-Genome Operon Prediction , 2000, ISMB.

[15]  R. Kishony,et al.  Functional classification of drugs by properties of their pairwise interactions , 2006, Nature Genetics.

[16]  Tao Jiang,et al.  Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. , 2004, Nucleic acids research.

[17]  Kenta Nakai,et al.  Modeling and predicting transcriptional units of <$O_SSF>Escherichia coli<$C_SSF>genes using hidden Markov models , 1999, Bioinform..

[18]  Ying Xu,et al.  Improving operon prediction in E. coli , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[19]  M. Kanehisa,et al.  A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. , 2000, Nucleic acids research.

[20]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[21]  Katherine H. Huang,et al.  A novel method for accurate operon predictions in all sequenced prokaryotes , 2005, Nucleic acids research.

[22]  Chiara Sabatti,et al.  Co-expression pattern from DNA microarray experiments as a tool for operon prediction , 2002, Nucleic Acids Res..

[23]  K. N. Ramachandran Nair,et al.  A fuzzy guided genetic algorithm for operon prediction , 2005, Bioinform..

[24]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  S. Lory,et al.  Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen , 2000, Nature.

[26]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[27]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.