Using Multi Level Nearest Neighbor Classifiers for G-Protein Coupled Receptor Sub-families Prediction

Prediction based on the hydrophobicity of the protein yields potentially good classification rate as compared to the other compositions for G-Proteins coupled receptor (GPCR's) families and their respective subfamilies. In the current study, we make use of the hydrophobicity of the proteins in order to obtain a fourier spectrum of the protein sequence, which is then used for classification purpose. The classification of 17 GPCR subfamilies is based on Nearest Neighbor (NN) method, which is employed at two levels. At level-1 classification, the GPCR super-family is recognized and at level-2, the respective sub-families for the predicted super-family are classified. As against Support Vector Machine (SVM), NN approach has shown better performance using both jackknife and independent data set testing. The results are formulated using three performance measures, the Mathew's Correlation Coefficient (MCC), overall accuracy (ACC) and reliability (R) on both training and independent data sets. Comparison of our results is carried out with the overall class accuracies obtained for super-families using existing technique. The multilevel classifier has shown promising performance and has achieved overall ACC and MCC of 97.02% and 0.95 using jackknife test, and 87.50 % and 0.85 for independent data set test respectively.

[1]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[2]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  P Bork,et al.  Wanted: subcellular localization of proteins based on sequence. , 1998, Trends in cell biology.

[5]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[6]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[7]  Gert Vriend,et al.  Collecting and harvesting biological data: the GPCRDB and NucleaRDB information systems , 2001, Nucleic Acids Res..

[8]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[9]  Marjana Novic,et al.  Investigation of Infrared Spectra-Structure Correlation Using Kohonen and Counterpropagation Neural Network , 1995, J. Chem. Inf. Comput. Sci..

[10]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[11]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[12]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[13]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[14]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[15]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[16]  Vasilis J. Promponas,et al.  PRED-GPCR: GPCR recognition and family classification server , 2004, Nucleic Acids Res..

[17]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[18]  Yanzhi Guo,et al.  Fast fourier transform-based support vector machine for prediction of G-protein coupled receptor subfamilies. , 2005, Acta biochimica et biophysica Sinica.

[19]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[20]  T. Kikuchi,et al.  Construction of Hypothetical Three-Dimensional Structure of P2Y1 Receptor Based on Fourier Transform Analysis , 2002, Journal of protein chemistry.

[21]  Denise Gorse,et al.  A novel approach to the recognition of protein architecture from sequence using fourier analysis and neural networks , 2002, Proteins.

[22]  Zoi I. Litou,et al.  A Novel method for GPCR recognition and family classification from sequence alone using signatures derived from profile hidden Markov models , 2003, SAR and QSAR in environmental research.

[23]  W. Stigelman,et al.  Goodman and Gilman's the Pharmacological Basis of Therapeutics , 1986 .

[24]  Michael F. Shlesinger,et al.  WAVELET TRANSFORMATION OF PROTEIN HYDROPHOBICITY SEQUENCES SUGGESTS THEIR MEMBERSHIPS IN STRUCTURAL FAMILIES , 1997 .

[25]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[26]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[27]  Asifullah Khan,et al.  Combination of support vector machines using genetic programming , 2006, Int. J. Hybrid Intell. Syst..

[28]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[29]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[30]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[31]  Asifullah Khan,et al.  Combination and optimization of classifiers in gender classification using genetic programming , 2005 .

[32]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[33]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[34]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.