SVM-Based Classification of Class C GPCRs from Alignment-Free Physicochemical Transformations of Their Sequences

G protein-coupled receptors (GPCRs) have a key function in regulating the function of cells due to their ability to transmit extracelullar signals. Given that the 3D structure and the functionality of most GPCRs is unknown, there is a need to construct robust classification models based on the analysis of their amino acid sequences for protein homology detection. In this paper, we describe the supervised classification of the different subtypes of class C GPCRs using support vector machines (SVMs). These models are built on different transformations of the amino acid sequences based on their physicochemical properties. Previous research using semi-supervised methods on the same data has shown the usefulness of such transformations. The obtained classification models show a robust performance, as their Matthews correlation coefficient is close to 0.91 and their prediction accuracy is close to 0.93.

[1]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Alfredo Vellido,et al.  Advances in Semi-Supervised Alignment-Free Classication of G Protein-Coupled Receptors , 2013, IWBBIO.

[6]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[7]  R. Stevens,et al.  Structure-function of the G protein-coupled receptor superfamily. , 2013, Annual review of pharmacology and toxicology.

[8]  L. Prézeau,et al.  Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. , 2003, Pharmacology & therapeutics.

[9]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[10]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  B. Liu,et al.  Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection , 2012, PloS one.

[13]  Etsuko N Moriyama,et al.  Protein family classification with partial least squares. , 2007, Journal of proteome research.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[16]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.