Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition

Summary.As an important transmembrane protein family in eukaryon, G-protein coupled receptors (GPCRs) play a significant role in cellular signal transduction and are important targets for drug design. However, it is very difficult to resolve their tertiary structure by X-ray crystallography. In this study, we have developed a Delaunay model, which constructs a series of simplexes with latent variables to classify the families of GPCRs and projects unknown sequences to principle component space (PC-space) to predict their topology. Computational results show that, for the classification of GPCRs, the method achieves the accuracy of 91.0 and 87.6% for Class A, more than 80% for the other three classes in differentiating GPCRs from non-GPCRs and 70% for discriminating between four major classes of GPCR, respectively. When recognizing the structure of GPCRs, all the N-terminals of sequences can be determined correctly. The maximum accuracy of predicting transmembrane segments is achieved in the 7th transmembrane segment of Rhodopsin, which is 99.4%, and the average error is 2.1 amino acids, which is the lowest in all of the segments prediction. This method could provide structural information of a novel GPCR as a tool for experiments and other algorithms of structure prediction of GPCRs. Academic users should send their request for the MATLAB program for classifying GPCRs and predicting the topology of them at liml@scu.edu.cn.

[1]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[2]  Z. Huang,et al.  Using cellular automata images and pseudo amino acid composition to predict protein subcellular location , 2005, Amino Acids.

[3]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[4]  G. Tusnády,et al.  Principles governing amino acid composition of integral membrane proteins: application to topology prediction. , 1998, Journal of molecular biology.

[5]  Shigeki Mitaku,et al.  SOSUI: classification and secondary structure prediction system for membrane proteins , 1998, Bioinform..

[6]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[7]  Kuo-Chen Chou,et al.  Predicting protein localization in budding Yeast , 2005, Bioinform..

[8]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[9]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[10]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[11]  Kuo-Chen Chou,et al.  Prediction of protein signal sequences. , 2002, Current protein & peptide science.

[12]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[13]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[14]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[15]  Kuo-Chen Chou,et al.  Using supervised fuzzy clustering to predict protein structural classes. , 2005, Biochemical and biophysical research communications.

[16]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[17]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[18]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[19]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[20]  Yanzhi Guo,et al.  Fast fourier transform-based support vector machine for prediction of G-protein coupled receptor subfamilies. , 2005, Acta biochimica et biophysica Sinica.

[21]  Corrigendum to “Predicting protein structural class by functional domain composition” [Biochem. Biophys. Res. Commun. 321 (2004) 1007–1009] , 2005 .

[22]  Kuo-Chen Chou,et al.  Using functional domain composition to predict enzyme family classes. , 2005, Journal of proteome research.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[25]  Desire L. Massart,et al.  The Law of Mixtures method for multivariate calibration , 2003 .

[26]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[27]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[28]  Shoshi Kikuchi,et al.  The Rice PIPELINE: a unification tool for plant functional genomics , 2004, Nucleic Acids Res..

[29]  W R Pearson,et al.  Flexible sequence similarity searching with the FASTA3 program package. , 2000, Methods in molecular biology.

[30]  Z. Huang,et al.  Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter , 2005, Amino Acids.

[31]  Xiaoyong Zou,et al.  Prediction of Transmembrane Proteins Based on the Continuous Wavelet Transform , 2004, J. Chem. Inf. Model..

[32]  Kuo-Chen Chou,et al.  Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. , 2005, Biochemical and biophysical research communications.

[33]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[34]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[35]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[36]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[37]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[38]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[39]  A Elofsson,et al.  Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. , 1997, Protein engineering.

[40]  T K Attwood,et al.  Deriving structural and functional insights from a ligand-based hierarchical classification of G protein-coupled receptors. , 2002, Protein engineering.

[41]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[42]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[43]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[44]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Pietro Liò,et al.  Wavelet change-point prediction of transmembrane proteins , 2000, Bioinform..

[46]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[47]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[48]  Masami Ikeda,et al.  Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern , 2004, Comput. Biol. Chem..

[49]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[50]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites by statistical rulers. , 2005, Biochemical and biophysical research communications.

[51]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[52]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[53]  Rolf Apweiler,et al.  Erratum: Evaluation of methods for the prediction of membrane spanning regions , 2002, Bioinform..

[54]  K. Chou,et al.  A study on the correlation of G-protein-coupled receptor types with amino acid composition. , 2002, Protein engineering.

[55]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[56]  S J Hamodrakas,et al.  A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. , 1999, Protein engineering.

[57]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[58]  S. D. Jong,et al.  The kernel PCA algorithms for wide data. Part I: Theory and algorithms , 1997 .

[59]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2000, Science.

[60]  Stefan Rännar,et al.  Polypeptide sequence property relationships in Escherichia coli based on auto cross covariances , 1995 .

[61]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[62]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[63]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[64]  M. Wang,et al.  Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. , 2004, Protein engineering, design & selection : PEDS.

[65]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[66]  Z. Feng,et al.  Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. , 2001, Biopolymers.

[67]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[68]  K. Chou,et al.  Bioinformatical analysis of G-protein-coupled receptors. , 2002, Journal of proteome research.

[69]  Qing-Song Xu,et al.  Delaunay triangulation method for multivariate calibration , 2003 .

[70]  Rolf Apweiler,et al.  Evaluation of methods for the prediction of membrane spanning regions , 2001, Bioinform..

[71]  Gert Lubec,et al.  Searching for hypothetical proteins: Theory and practice based upon original data and literature , 2005, Progress in Neurobiology.

[72]  K. Chou,et al.  Low-frequency collective motion in biomacromolecules and its biological functions. , 1988, Biophysical chemistry.

[73]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[74]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.