The Application of Support Vector Machine to Operon Prediction

In this paper, we apply the least-square support vector machine (LS-SVM) to operon prediction of Escherichia coli (E.coli), with different combinations of intergenic distance, gene expression data, and phylogenetic profile. Experimental results demonstrate that the WO pairs tend to have shorter intergenic distances, higher correlation coefficient and much stronger relation of co-envoled between phylogenetic profiles. Also, we dealt with the data sets extracted from WOs¿ and TUBs¿, processed the intergenic distances with log-energy entropy, de-noised the Pearson correlation coefficients of two genes expression data with wavelet transform, and computed the Hamming distances of two phylogenetic profiles. Then we trained LS-SVM using part of the data sets and tested the trained classifier model using the rest data sets. It shows that different combinations of features could affect the prediction results. When the combination of intergenic distance, gene expression data and phylogenetic profile is taken as the input of LS-SVM in the linear kernel type, good results can be obtained, of which the accuracy, sensitivity and specificity are 92.34%, 93.54%, and 90.73%, respectively.

[1]  J. Monod,et al.  Genetic regulatory mechanisms in the synthesis of proteins. , 1961, Journal of Molecular Biology.

[2]  Julio Collado-Vides,et al.  RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions , 2005, Nucleic Acids Res..

[3]  Xin Chen,et al.  Computational prediction of operons in Synechococcus sp. WH8102. , 2004, Genome informatics. International Conference on Genome Informatics.

[4]  R. Polozov,et al.  Wavelet analysis of DNA sequences. , 1996, Genetic analysis : biomolecular engineering.

[5]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Chiara Sabatti,et al.  Co-expression pattern from DNA microarray experiments as a tool for operon prediction , 2002, Nucleic Acids Res..

[7]  David Page,et al.  A Bayesian Network Approach to Operon Prediction , 2003, Bioinform..

[8]  A. Khodursky,et al.  Nitrogen regulatory protein C-controlled genes of Escherichia coli: scavenging as a defense against nitrogen limitation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[9]  N R Cozzarelli,et al.  Analysis of topoisomerase function in bacterial replication fork movement: use of DNA microarrays. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Botstein,et al.  DNA microarray analysis of gene expression in response to physiological and genetic changes that affect tryptophan metabolism in Escherichia coli. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[11]  P. Pomposiello,et al.  Genome-Wide Transcriptional Profiling of theEscherichia coli Responses to Superoxide Stress and Sodium Salicylate , 2001, Journal of bacteriology.

[12]  Jill P. Mesirov,et al.  Improving genome annotations using phylogenetic profile anomaly detection , 2005, Bioinform..

[13]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[14]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Jeremy Buhler,et al.  Operon prediction without a training set , 2005, Bioinform..

[16]  Yixue Li,et al.  Operon prediction based on SVM , 2006, Comput. Biol. Chem..

[17]  Ying Xu Computational Genome Annotation , 2005 .