An ensemble classifier based prediction of G-protein-coupled receptor classes in low homology

G-protein-coupled receptors (GPCRs) play an important role in physiological processes which are the targets of more than 50% of marketed drugs. In this research, we use a hybrid approach of predicted secondary structural features (PSSF) and approximate entropy (ApEn) as the feature selection method for predicting G-protein-coupled receptors in low homology. The low homology dataset is used to validate the proposed method for its objectivity. The classification model based on the fuzzy K-nearest neighbor classifier has been utilized on the classification of membrane proteins data. In order to enhance the prediction accuracies, here we propose an ensemble classifier as the prediction engine. Compared with the previous best-performing method, the success rate is encouraging. The reliable results also demonstrate the proposed method could contribute more to the characterization of various proteomes and further utilized in neuroscience.

[1]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[2]  Yong-Sheng Ding,et al.  Eukaryotic Evolutionary Transitions Are Associated with Extreme Codon Bias in Functionally-Related Proteins , 2011, PloS one.

[3]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[4]  Dominique Gauguier,et al.  Direct quantitative trait locus mapping of mammalian metabolic phenotypes in diabetic and normoglycemic rat models , 2007, Nature Genetics.

[5]  Cangzhi Jia,et al.  A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. , 2010, Journal of theoretical biology.

[6]  Yong-Sheng Ding,et al.  Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection , 2010, Amino Acids.

[7]  Huijun Gao,et al.  A Constrained Evolutionary Computation Method for Detecting Controlling Regions of Cortical Networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[9]  Tongliang Zhang,et al.  Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes , 2007, Amino Acids.

[10]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[11]  Etsuko N Moriyama,et al.  Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors. , 2007, Genomics.

[12]  H. Horiuchi Seven-transmembrane receptors , 2015 .

[13]  Cheng Wu,et al.  Classification of amine type G-protein coupled receptors with feature selection. , 2008, Protein and peptide letters.

[14]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[15]  Xiaoying Jiang,et al.  Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. , 2008, Protein and peptide letters.

[16]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[17]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[18]  Cheng Zhang,et al.  Structure and Function of an Irreversible Agonist-β2 Adrenoceptor complex , 2010, Nature.

[19]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2002, Chembiochem : a European journal of chemical biology.

[20]  Lukasz A. Kurgan,et al.  Secondary structure-based assignment of the protein structural classes , 2008, Amino Acids.

[21]  Asifullah Khan,et al.  Erratum to: GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble , 2011, Amino Acids.

[22]  Kuldip Singh,et al.  A Novel and Efficient Technique for Identification and Classification of GPCRs , 2008, IEEE Transactions on Information Technology in Biomedicine.

[23]  Shinn-Ying Ho,et al.  Prediction and Analysis of Antibody Amyloidogenesis from Sequences , 2013, PloS one.

[24]  Lukasz A. Kurgan,et al.  Prediction of structural classes for protein sequences and domains - Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy , 2006, Pattern Recognit..

[25]  Yang Tang,et al.  Exponential Synchronization of Coupled Switched Neural Networks With Mode-Dependent Impulsive Effects , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[26]  R. Stevens,et al.  High-Resolution Crystal Structure of an Engineered Human β2-Adrenergic G Protein–Coupled Receptor , 2007, Science.

[27]  Wai Keung Wong,et al.  Distributed Synchronization of Coupled Neural Networks via Randomly Occurring Control , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Antonio Reverter,et al.  Genome-wide patterns of promoter sharing and co-expression in bovine skeletal muscle , 2011, BMC Genomics.

[29]  Limsoon Wong,et al.  Embracing noise to improve cross-batch prediction accuracy , 2012, BMC Systems Biology.

[30]  E. Anderson Hudson et al. , 1977 .

[31]  M. Burghammer,et al.  Crystal structure of the human β2 adrenergic G-protein-coupled receptor , 2007, Nature.

[32]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[33]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[34]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[35]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[36]  Kuo-Chen Chou,et al.  Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. , 2008, Journal of theoretical biology.

[37]  Xin Chen,et al.  An improved classification of G-protein-coupled receptors using sequence-derived features , 2010, BMC Bioinformatics.

[38]  Huijun Gao,et al.  Multiobjective Identification of Controlling Areas in Neuronal Networks , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Joshua M. Kunken,et al.  Fusion partner toolchest for the stabilization and crystallization of G protein-coupled receptors. , 2012, Structure.

[40]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[41]  Bas Vroling,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[42]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Asifullah Khan,et al.  G-protein-coupled receptor prediction using pseudo-amino-acid composition and multiscale energy representation of different physiochemical properties. , 2011, Analytical biochemistry.

[44]  Joan E Bailey-Wilson,et al.  Linkage analysis identifies a locus for plasma von Willebrand factor undetected by genome-wide association , 2012, Proceedings of the National Academy of Sciences.

[45]  Stavros J. Hamodrakas,et al.  Bioinformatics Original Paper Prediction of the Coupling Specificity of Gpcrs to Four Families of G-proteins Using Hidden Markov Models and Artificial Neural Networks , 2022 .

[46]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[47]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[48]  K. Chou,et al.  Using maximum entropy model to predict protein secondary structure with single sequence. , 2009, Protein and peptide letters.

[49]  Yongsheng Ding,et al.  Using Chou's pseudo amino acid composition to predict subcellular localization of apoptosis proteins: An approach with immune genetic algorithm-based ensemble classifier , 2008, Pattern Recognit. Lett..

[50]  Insu Song,et al.  Content-based classification of breath sound with enhanced features , 2014, Neurocomputing.

[51]  Jun Hu,et al.  TargetATPsite: A template‐free method for ATP‐binding sites prediction with residue evolution image sparse representation and classifier ensemble , 2013, J. Comput. Chem..

[52]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[53]  J. Chou,et al.  Mechanism of drug inhibition and drug resistance of influenza A M2 channel , 2009, Proceedings of the National Academy of Sciences.

[54]  Bing Niu,et al.  HIV-1 protease cleavage site prediction based on two-stage feature selection method. , 2013, Protein and peptide letters.

[55]  K. Chou,et al.  Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites , 2010, Journal of biomolecular structure & dynamics.

[56]  Gonzalo Joya,et al.  Associating arbitrary-order energy functions to an artificial neural network Implications concerning the resolution of optimization problems , 1997 .

[57]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[58]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[59]  Lei Chen,et al.  Predicting Metabolic Pathways of Small Molecules and Enzymes Based on Interaction Information of Chemicals and Proteins , 2012, PloS one.

[60]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[61]  Lihua Li,et al.  Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features , 2011, J. Comput. Chem..

[62]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[63]  Patrick Scheerer,et al.  Crystal structure of the ligand-free G-protein-coupled receptor opsin , 2008, Nature.

[64]  Kuo-Chen Chou,et al.  A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. , 2003, Biochemical and biophysical research communications.

[65]  Lukasz A. Kurgan,et al.  SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences , 2008, BMC Bioinformatics.

[66]  Shu-Lin Wang,et al.  Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification , 2012, BMC Bioinformatics.

[67]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[68]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[69]  M Michael Gromiha,et al.  Hydrophobic environment is a key factor for the stability of thermophilic proteins , 2013, Proteins.

[70]  M. Dumas,et al.  Statistical recoupling prior to significance testing in nuclear magnetic resonance based metabonomics. , 2009, Analytical chemistry.

[71]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[72]  Kuo-Chen Chou,et al.  Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. , 2007, Protein engineering, design & selection : PEDS.

[73]  Yongsheng Ding,et al.  Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence , 2006, Comput. Biol. Chem..