Operon prediction based on SVM

The operon is a specific functional organization of genes found in bacterial genomes. Most genes within operons share common features. The support vector machine (SVM) approach is here used to predict operons at the genomic level. Four features were chosen as SVM input vectors: the intergenic distances, the number of common pathways, the number of conserved gene pairs and the mutual information of phylogenetic profiles. The analysis reveals that these common properties are indeed characteristic of the genes within operons and are different from that of non-operonic genes. Jackknife testing indicates that these input feature vectors, employed with RBF kernel SVM, achieve high accuracy. To validate the method, Escherichia coli K12 and Bacillus subtilis were taken as benchmark genomes of known operon structure, and the prediction results in both show that the SVM can detect operon genes in target genomes efficiently and offers a satisfactory balance between sensitivity and specificity.

[1]  Katherine H. Huang,et al.  A novel method for accurate operon predictions in all sequenced prokaryotes , 2005, Nucleic acids research.

[2]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[3]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[4]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[5]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[6]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Xin Chen,et al.  Computational prediction of operons in Synechococcus sp. WH8102. , 2004, Genome informatics. International Conference on Genome Informatics.

[8]  David Page,et al.  A Bayesian Network Approach to Operon Prediction , 2003, Bioinform..

[9]  J. Trawick,et al.  Genome-wide operon prediction in Staphylococcus aureus. , 2004, Nucleic acids research.

[10]  Kenta Nakai,et al.  Modeling and predicting transcriptional units of <$O_SSF>Escherichia coli<$C_SSF>genes using hidden Markov models , 1999, Bioinform..

[11]  Edward M Marcotte,et al.  Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages , 2003, Nature Biotechnology.

[12]  T Yada,et al.  Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models. , 1999, Bioinformatics.

[13]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[14]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[15]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[16]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[17]  Denis Thieffry,et al.  RegulonDB: a database on transcriptional regulation in Escherichia coli , 1998, Nucleic Acids Res..

[18]  K. N. Ramachandran Nair,et al.  A fuzzy guided genetic algorithm for operon prediction , 2005, Bioinform..

[19]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[20]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[21]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[22]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[23]  S. C. Rison,et al.  A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context , 2005, Nucleic acids research.

[24]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Simon Kasif,et al.  Genomic functional annotation using co-evolution profiles of gene clusters , 2002, Genome Biology.