Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm

BackgroundBecause a priori knowledge about function of G protein-coupled receptors (GPCRs) can provide useful information to pharmaceutical research, the determination of their function is a quite meaningful topic in protein science. However, with the rapid increase of GPCRs sequences entering into databanks, the gap between the number of known sequence and the number of known function is widening rapidly, and it is both time-consuming and expensive to determine their function based only on experimental techniques. Therefore, it is vitally significant to develop a computational method for quick and accurate classification of GPCRs.ResultsIn this study, a novel three-layer predictor based on support vector machine (SVM) and feature selection is developed for predicting and classifying GPCRs directly from amino acid sequence data. The maximum relevance minimum redundancy (mRMR) is applied to pre-evaluate features with discriminative information while genetic algorithm (GA) is utilized to find the optimized feature subsets. SVM is used for the construction of classification models. The overall accuracy with three-layer predictor at levels of superfamily, family and subfamily are obtained by cross-validation test on two non-redundant dataset. The results are about 0.5% to 16% higher than those of GPCR-CA and GPCRPred.ConclusionThe results with high success rates indicate that the proposed predictor is a useful automated tool in predicting GPCRs. GPCR-SVMFS, a corresponding executable program for GPCRs prediction and classification, can be acquired freely on request from the authors.

[1]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[2]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[3]  Kuo-Chen Chou,et al.  Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes , 2008, J. Comput. Chem..

[4]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[5]  K. Chou,et al.  Predicting the quaternary structure attribute of a protein by hybridizing functional domain composition and pseudo amino acid composition , 2009 .

[6]  J. Baldwin,et al.  Structure and function of receptors coupled to G proteins. , 1994, Current opinion in cell biology.

[7]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[8]  Gabriela Alexe,et al.  A robust meta‐classification strategy for cancer detection from MS data , 2006, Proteomics.

[9]  Cheng Wu,et al.  Prediction of nuclear receptors with optimal pseudo amino acid composition. , 2009, Analytical biochemistry.

[10]  X. Xiao,et al.  Application of protein grey incidence degree measure to predict protein quaternary structural types , 2009, Amino Acids.

[11]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[12]  Kuldip Singh,et al.  A Novel and Efficient Technique for Identification and Classification of GPCRs , 2008, IEEE Transactions on Information Technology in Biomedicine.

[13]  M. Schiffer,et al.  Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. , 1967, Biophysical journal.

[14]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[15]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[16]  G. Rose,et al.  Hydrophobicity of amino acid residues in globular proteins. , 1985, Science.

[17]  Michel Bouvier,et al.  Structural and functional aspects of G protein-coupled receptor oligomerization. , 1998 .

[18]  Masami Ikeda,et al.  Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern , 2004, Comput. Biol. Chem..

[19]  Peteris Prusis,et al.  Improved approach for proteochemometrics modeling: application to organic compound - amine G protein-coupled receptor interactions , 2005, Bioinform..

[20]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[21]  Jia He,et al.  Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. , 2010, Analytical biochemistry.

[22]  R. Lefkowitz The superfamily of heptahelical receptors , 2000, Nature Cell Biology.

[23]  Vasilis J. Promponas,et al.  PRED-GPCR: GPCR recognition and family classification server , 2004, Nucleic Acids Res..

[24]  Melanie Hilario,et al.  Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents , 2004, Proteomics.

[25]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[26]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[27]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[28]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[29]  Zheng-Zhi Wang,et al.  Classification of G-protein coupled receptors at four levels. , 2006, Protein engineering, design & selection : PEDS.

[30]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[31]  Kuo-Chen Chou,et al.  GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis. , 2009, Protein engineering, design & selection : PEDS.

[32]  Xiaodong Lin,et al.  Gene expression Gene selection using support vector machines with non-convex penalty , 2005 .

[33]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[34]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[35]  Gajendra P. S. Raghava,et al.  BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques , 2005, Nucleic Acids Res..

[36]  K. Chou,et al.  A study on the correlation of G-protein-coupled receptor types with amino acid composition. , 2002, Protein engineering.

[37]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[38]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[39]  Guo-Li Shen,et al.  A chaotic approach to maintain the population diversity of genetic algorithm in network training , 2003, Comput. Biol. Chem..

[40]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[41]  Zoi I. Litou,et al.  A Novel method for GPCR recognition and family classification from sequence alone using signatures derived from profile hidden Markov models , 2003, SAR and QSAR in environmental research.

[42]  Jun Cai,et al.  Classifying G-protein coupled receptors with bagging classification tree , 2004, Comput. Biol. Chem..

[43]  Z. Wen,et al.  Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition , 2007, Amino Acids.

[44]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[45]  Alex Alves Freitas,et al.  On the hierarchical classification of G protein-coupled receptors , 2007, Bioinform..

[46]  Z.-C. Li,et al.  Prediction of protein structure class by coupling improved genetic algorithm and support vector machine , 2008, Amino Acids.

[47]  R. Neubig,et al.  Depicting a protein's two faces: GPCR classification by phylogenetic tree‐based HMMs , 2003, FEBS letters.

[48]  Cheol-Goo Hur,et al.  A combined approach for the classification of G protein-coupled receptors and its application to detect GPCR splice variants , 2007, Comput. Biol. Chem..

[49]  Gajendra P. S. Raghava,et al.  GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors , 2005, Nucleic Acids Res..

[50]  Tae-Sun Choi,et al.  Proximity based GPCRs prediction in transform domain. , 2008, Biochemical and biophysical research communications.

[51]  Habtom W. Ressom,et al.  Peak selection from MALDI-TOF mass spectra using ant colony optimization , 2007, Bioinform..

[52]  David E. Gloriam,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[53]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[54]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[55]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[56]  K. Chou,et al.  Bioinformatical analysis of G-protein-coupled receptors. , 2002, Journal of proteome research.

[57]  BMC Bioinformatics , 2005 .

[58]  K. Chou,et al.  Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. , 2008, Journal of theoretical biology.

[59]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[60]  Hu Chen,et al.  A novel method for protein secondary structure prediction using dual‐layer SVM and profiles , 2004, Proteins.

[61]  Xiaoyong Zou,et al.  Predicting protein structural class based on multi-features fusion. , 2008, Journal of theoretical biology.

[62]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[63]  Jianding Qiu,et al.  Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform. , 2009, Analytical biochemistry.

[64]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[65]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[66]  Kam D. Dahlquist,et al.  Regression Approaches for Microarray Data Analysis , 2002, J. Comput. Biol..

[67]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.