The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors

G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.

[1]  Alfred Ultsch,et al.  Label Propagation for Semi-Supervised Learning in Self-Organizing Maps , 2007 .

[2]  Hualiang Jiang,et al.  Structural Basis for Molecular Recognition at Serotonin Receptors , 2013, Science.

[3]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[4]  Ruben Abagyan,et al.  The GPCR Network: a large-scale collaboration to determine human GPCR structure and function , 2012, Nature Reviews Drug Discovery.

[5]  J. Seong,et al.  Cellular and molecular biology of orphan G protein-coupled receptors. , 2006, International review of cytology.

[6]  Qinghua Hu,et al.  A novel measure for evaluating classifiers , 2010, Expert Syst. Appl..

[7]  Alfredo Vellido,et al.  Advances in Semi-Supervised Alignment-Free Classication of G Protein-Coupled Receptors , 2013, IWBBIO.

[8]  Etsuko N Moriyama,et al.  Protein family classification with partial least squares. , 2007, Journal of proteome research.

[9]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[10]  Øyvind Edvardsen,et al.  GPCRDB: information system for G protein-coupled receptors , 2010, Nucleic Acids Res..

[11]  Alfredo Vellido,et al.  A probabilistic approach to the visual exploration of G Protein-Coupled Receptor sequences , 2011, ESANN.

[12]  M. Rask-Andersen,et al.  Trends in the exploitation of novel drug targets , 2011, Nature Reviews Drug Discovery.

[13]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2000, Science.

[14]  Jens Meiler,et al.  Structure of a Class C GPCR Metabotropic Glutamate Receptor 1 Bound to an Allosteric Modulator , 2014, Science.

[15]  A. Doré,et al.  Structure of class C GPCR metabotropic glutamate receptor 5 transmembrane domain , 2014, Nature.

[16]  Etsuko N. Moriyama,et al.  Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties , 2000, Bioinform..

[17]  Alex Alves Freitas,et al.  On the hierarchical classification of G protein-coupled receptors , 2007, Bioinform..

[18]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[19]  Alfredo Vellido,et al.  Complementing Kernel-Based Visualization of Protein Sequences with Their Phylogenetic Tree , 2011, CIBB.

[20]  C. Branden,et al.  Introduction to protein structure , 1991 .

[21]  Ali Jazayeri,et al.  Structure of class B GPCR corticotropin-releasing factor receptor 1 , 2013, Nature.

[22]  Alfredo Vellido,et al.  Semi-supervised geodesic Generative Topographic Mapping , 2010, Pattern Recognit. Lett..

[23]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[24]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[25]  Takashi Nakayama,et al.  Alignment-Free Classification of G-Protein-Coupled Receptors Using Self-Organizing Maps , 2006, J. Chem. Inf. Model..

[26]  Bryan L. Roth,et al.  Structure of the human smoothened receptor bound to an antitumour agent , 2013, Nature.

[27]  Cesare Furlanello,et al.  A Comparison of MCC and CEN Error Measures in Multi-Class Prediction , 2010, PloS one.

[28]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[29]  Stephen PH Alexander,et al.  The Concise Guide to Pharmacology 2013/14: G Protein-Coupled Receptors , 2013, British journal of pharmacology.

[30]  B. Liu,et al.  Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection , 2012, PloS one.

[31]  H. Schiöth,et al.  The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints. , 2003, Molecular pharmacology.

[32]  Darrell R. Abernethy,et al.  International Union of Pharmacology: Approaches to the Nomenclature of Voltage-Gated Ion Channels , 2003, Pharmacological Reviews.

[33]  Jan Gorodkin,et al.  Comparing two K-category assignments by a K-category correlation coefficient , 2004, Comput. Biol. Chem..

[34]  Constantin F. Aliferis,et al.  Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective , 2006, Cancer informatics.

[35]  Alfredo Vellido,et al.  Semi-Supervised Analysis of Human Brain Tumours from Partially Labeled MRS Information, Using Manifold Learning Models , 2011, Int. J. Neural Syst..

[36]  Jian Huang,et al.  A Semi-supervised SVM for Manifold Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[37]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[38]  L. Prézeau,et al.  Dimers and beyond: The functional puzzles of class C GPCRs. , 2011, Pharmacology & therapeutics.

[39]  R. Stevens,et al.  Structural Features for Functional Selectivity at Serotonin Receptors , 2013, Science.

[40]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2002, Chembiochem : a European journal of chemical biology.

[41]  Richard R. Neubig,et al.  International Union of Pharmacology. XLVI. G Protein-Coupled Receptor List , 2005, Pharmacological Reviews.

[42]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[43]  Bas Vroling,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[44]  R. Stevens,et al.  Structure-function of the G protein-coupled receptor superfamily. , 2013, Annual review of pharmacology and toxicology.

[45]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[46]  L. Prézeau,et al.  Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. , 2003, Pharmacology & therapeutics.

[47]  Chris de Graaf,et al.  Structure of the human glucagon class B G-protein-coupled receptor , 2013, Nature.