Reducing the n-gram feature space of class C GPCRs to subtype-discriminating patterns

Summary G protein-coupled receptors (GPCRs) are a large and heterogeneous superfamily of receptors that are key cell players for their role as extracellular signal transmitters. Class C GPCRs, in particular, are of great interest in pharmacology. The lack of knowledge about their full 3-D structure prompts the use of their primary amino acid sequences for the construction of robust classifiers, capable of discriminating their different subtypes. In this paper, we investigate the use of feature selection techniques to build Support Vector Machine (SVM)-based classification models from selected receptor subsequences described as n-grams. We show that this approach to classification is useful for finding class C GPCR subtype-specific motifs.

[1]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[2]  L. Prézeau,et al.  Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. , 2003, Pharmacology & therapeutics.

[3]  Alfredo Vellido,et al.  SVM-Based Classification of Class C GPCRs from Alignment-Free Physicochemical Transformations of Their Sequences , 2013, ICIAP Workshops.

[4]  Roberto Therón,et al.  Treevolution: visual analysis of phylogenetic trees , 2009, Bioinform..

[5]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[6]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2002, Chembiochem : a European journal of chemical biology.

[7]  Alfredo Vellido,et al.  Advances in Semi-Supervised Alignment-Free Classication of G Protein-Coupled Receptors , 2013, IWBBIO.

[8]  Alfredo Vellido,et al.  Exploratory Visualization of Misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques , 2014, IWBBIO.

[9]  Alfredo Vellido,et al.  Misclassification of class C G-protein-coupled receptors as a label noise problem , 2014, ESANN.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Colleen M Niswender,et al.  Progress toward advanced understanding of metabotropic glutamate receptors: structure, signaling and therapeutic indications. , 2014, Cellular signalling.

[12]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[13]  R. Stevens,et al.  Structure-function of the G protein-coupled receptor superfamily. , 2013, Annual review of pharmacology and toxicology.

[14]  P Jeffrey Conn,et al.  "Molecular switches" on mGluR allosteric ligands that modulate modes of pharmacology. , 2011, Biochemistry.

[15]  Benjamin G Tehan,et al.  Structure of Class B GPCRs: new horizons for drug discovery , 2014, British journal of pharmacology.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Cornelia Caragea,et al.  Protein sequence classification using feature hashing , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[18]  Yücel Saygin,et al.  Classification of GPCRs Using Family Specific Motifs , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Cornelia Caragea,et al.  Protein Sequence Classification Using Feature Hashing , 2011, BIBM.

[21]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[22]  Alex Alves Freitas,et al.  Optimizing amino acid groupings for GPCR classification , 2008, Bioinform..

[23]  Igor Goryanin,et al.  Journal of Integrative Bioinformatics , 2015 .