Mutation probability of cytochrome P450 based on a genetic algorithm and support vector machine

The support vector machine (SVM), an effective statistical learning method, has been widely used in mutation prediction. Two factors, i.e., feature selection and parameter setting, have shown great influence on the efficiency and accuracy of SVM classification. In this study, according to the principles of a genetic algorithm (GA) and SVM, we developed a GA‐SVM program and applied it to human cytochrome P450s (CYP450s), which are important monooxygenases in phase I drug metabolism. The program optimizes features and parameters simultaneously, and hence fewer features are used and the overall prediction accuracy is improved. We focus on the mutation of non‐synonymous single nucleotide polymorphisms (nsSNPs) in protein sequences that appear to exhibit significant influences on drug metabolism. The final predictive model has a quite satisfactory performance, with the prediction accuracy of 61% and cross‐validation accuracy of 73%. The results indicate that the GA‐SVM program is a powerful tool in optimizing mutation predictive models of nsSNPs of human CYP450s.

[1]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[2]  S. Rackovsky,et al.  Differential geometry and polymer conformation. 4. Conformational and nucleation properties of individual amino acids , 1982 .

[3]  O. Froy Cytochrome P450 and the biological clock in mammals. , 2009, Current drug metabolism.

[4]  H. Scheraga,et al.  Analysis of Conformations of Amino Acid Residues and Prediction of Backbone Topography in Proteins , 1974 .

[5]  Shaomin Yan,et al.  Estimation of amino acid pairs sensitive to variants in human phenylalanine hydroxylase protein by means of a random approach , 2002, Peptides.

[6]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[7]  Nam Sook Kang,et al.  Classification models for CYP450 3A4 inhibitors and non-inhibitors. , 2009, European journal of medicinal chemistry.

[8]  S. Brunak,et al.  Predicting proteasomal cleavage sites: a comparison of available methods. , 2003, International immunology.

[9]  Kuo-Chen Chou,et al.  Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS , 2003, Biochemical and Biophysical Research Communications.

[10]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[11]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[12]  K. Chou Structural bioinformatics and its impact to biomedical science. , 2004, Current medicinal chemistry.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  S. Wold,et al.  Principal property values for six non-natural amino acids and their application to a structure–activity relationship for oxytocin peptide analogues , 1987 .

[15]  Yang Guo-xing K-Means Clustering Analysis Based on Genetic Algorithm , 2008 .

[16]  Roger L. Lundblad,et al.  Handbook of Biochemistry and Molecular Biology, Fifth Edition , 2010 .

[17]  E. Lin,et al.  A Support Vector Machine Approach to Assess Drug Efficacy of Interferon-α and Ribavirin Combination Therapy , 2012, Molecular Diagnosis & Therapy.

[18]  Alexander G. Georgiev,et al.  Interpretable Numerical Descriptors of Amino Acid Space , 2009, J. Comput. Biol..

[19]  Ismael Zamora,et al.  Exploration of Enzyme-Ligand Interactions in CYP2D6 & 3A4 Homology Models and Crystal Structures Using a Novel Computational Approach , 2007, J. Chem. Inf. Model..

[20]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[21]  Kuo-Chen Chou,et al.  Insights from modeling the 3D structure of H5N1 influenza virus neuraminidase and its binding interactions with ligands. , 2006, Biochemical and biophysical research communications.

[22]  Kuo-Chen Chou,et al.  Insight into the molecular switch mechanism of human Rab5a from molecular dynamics simulations. , 2009, Biochemical and biophysical research communications.

[23]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[24]  G. Wu,et al.  Frequency and Markov chain analysis of the amino-acid sequence of human alcohol dehydrogenase alpha-chain. , 2000, Alcohol and alcoholism.

[25]  G. Wu The first, second, third and fourth order Markov chain analysis on the amino-acid sequence of human dopamine β-hydroxylase , 2000, Molecular Psychiatry.

[26]  Kuo-Chen Chou,et al.  Molecular modeling of cytochrome P450 and drug metabolism. , 2010, Current drug metabolism.

[27]  Kurt Wüthrich,et al.  1H‐nmr parameters of the common amino acid residues measured in aqueous solutions of the linear tetrapeptides H‐Gly‐Gly‐X‐L‐Ala‐OH , 1979 .

[28]  Kuo-Chen Chou,et al.  Pharmacogenomics and personalized use of drugs. , 2008, Current topics in medicinal chemistry.

[29]  M. Prabhakaran,et al.  Shape and surface features of globular proteins , 1982 .

[30]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[31]  P. Ortiz de Montellano,et al.  Oxidizing species in the mechanism of cytochrome P450. , 2002, Natural product reports.

[32]  Khaled S. Ahmed,et al.  Estimating Protein Functions Correlation Based on Overlapping Proteins and Cluster Interactions , 2012 .

[33]  Shaomin Yan,et al.  Determination of amino acid pairs sensitive to variants in human copper-transporting ATPase 2. , 2004, Biochemical and biophysical research communications.

[34]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[35]  John H. Kalivas,et al.  Comparison of Forward Selection, Backward Elimination, and Generalized Simulated Annealing for Variable Selection , 1993 .

[36]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[37]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[38]  Kuo-Chen Chou,et al.  Structure of cytochrome p450s and personalized drug. , 2009, Current medicinal chemistry.

[39]  H A Scheraga,et al.  Influence of water on protein structure. An analysis of the preferences of amino acid residues for the inside or outside and for specific conformations in a protein molecule. , 1978, Macromolecules.

[40]  H. Scheraga,et al.  Folding of polypeptide chains in proteins: a proposed mechanism for folding. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[41]  G. Wilkinson,et al.  Drug metabolism and variability among patients in drug response. , 2005, The New England journal of medicine.

[42]  Tao Zhang,et al.  Prediction of function changes associated with single‐point protein mutations using support vector machines (SVMs) , 2009, Human mutation.

[43]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[44]  Determination of amino acid pairs in human haemoglobin α chain sensitive to variants by means of a random approach , 2003, Comparative Clinical Pathology.

[45]  Junghan Song,et al.  Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines , 2003, Clinical chemistry and laboratory medicine.

[46]  C. B. Lucasius,et al.  Understanding and using genetic algorithms Part 1. Concepts, properties and context , 1993 .

[47]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[48]  Determination of amino acid pairs in human p53 protein sensitive to mutations/variants by means of a random approach , 2003, Journal of molecular modeling.

[49]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[50]  M. Pirmohamed Pharmacogenetics and pharmacogenomics. , 2001, British journal of clinical pharmacology.

[51]  Shaomin Yan,et al.  Prediction of mutations in H1 neuraminidases from North America influenza A virus engineered by internal randomness , 2007, Molecular Diversity.

[52]  Danielson Pb,et al.  The cytochrome P450 superfamily: biochemistry, evolution and drug metabolism in humans. , 2002 .

[53]  S. Yan,et al.  Prediction of mutations engineered by randomness in H5N1 hemagglutinins of influenza A virus , 2007, Amino Acids.

[54]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[55]  Kuo-Chen Chou,et al.  Analogue inhibitors by modifying oseltamivir based on the crystal neuraminidase structure for treating drug-resistant H5N1 virus. , 2007, Biochemical and biophysical research communications.

[56]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[57]  B. Shastry,et al.  SNP alleles in human disease and evolution , 2002, Journal of Human Genetics.

[58]  Shaomin Yan,et al.  Prediction of possible mutations in H5N1 hemagglutitins of influenza A virus by means of logistic regression , 2006, Comparative Clinical Pathology.

[59]  Kuo-Chen Chou,et al.  Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design. , 2008, Protein and peptide letters.

[60]  J. Listgarten,et al.  Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms , 2004, Clinical Cancer Research.

[61]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[62]  K. Chou,et al.  Computational studies of the binding mechanism of calmodulin with chrysin. , 2007, Biochemical and biophysical research communications.

[63]  S. Md,et al.  Determination of Mutation Patterns in Human Ornithine Transcarbamylase Precursor , 2009, Journal of Clinical Monitoring and Computing.

[64]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[65]  Determination of sensitive positions to mutations in human p53 protein. , 2004, Biochemical and biophysical research communications.

[66]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[67]  Kuo-Chen Chou,et al.  Drug candidates from traditional chinese medicines. , 2008, Current topics in medicinal chemistry.

[68]  G. Ya. Wiederschain,et al.  Handbook of Biochemistry and Molecular Biology , 2010, Biochemistry (Moscow).

[69]  Bo Meng,et al.  Parameter Selection Algorithm for Support Vector Machine , 2011 .

[70]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[71]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[72]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.

[73]  P. Danielson,et al.  The cytochrome P450 superfamily: biochemistry, evolution and drug metabolism in humans. , 2002, Current drug metabolism.