Visual characterization of misclassified Class C GPCRs through Manifold-based machine learning methods

G-protein-coupled receptors are cell membrane proteins of great interest in biology and pharmacology. Previous analysis of Class C of these receptors has revealed the existence of an upper boundary on the accuracy that can be achieved in the classification of their standard subtypes from the unaligned transformation of their primary sequences. To further investigate this apparent boundary, the focus of the analysis in this paper is placed on receptor sequences that were previously misclassified using supervised learning methods. In our experiments, these sequences are visualized using a nonlinear dimensionality reduction technique and phylogenetic trees. They are subsequently characterized against the rest of the data and, particularly, against the rest of cases of their own subtype. This exploratory visualization should help us to discriminate between different types of misclassification and to build hypotheses about database quality problems and the extent to which GPCR sequence transformations limit subtype discriminability. The reported experiments provide a proof of concept for the proposed method.

[1]  John Maeda,et al.  Computational information design , 2004 .

[2]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[3]  Darren R. Flower,et al.  Novel visualization methods for protein data , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[4]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[5]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Donghui Kuang,et al.  Molecular Similarities in the Ligand Binding Pockets of an Odorant Receptor and the Metabotropic Glutamate Receptors* , 2003, Journal of Biological Chemistry.

[7]  Alfredo Vellido,et al.  Complementing Kernel-Based Visualization of Protein Sequences with Their Phylogenetic Tree , 2011, CIBB.

[8]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[9]  Helgi B. Schiöth,et al.  Structural diversity of G protein-coupled receptors and significance for drug discovery , 2008, Nature Reviews Drug Discovery.

[10]  Alfredo Vellido,et al.  The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors , 2014, Medical & Biological Engineering & Computing.

[11]  Jia He,et al.  Classifying G-protein-coupled receptors to the finest subtype level. , 2013, Biochemical and biophysical research communications.

[12]  Bas Vroling,et al.  GPCRdb: an information system for G protein-coupled receptors , 2015, Nucleic Acids Res..

[13]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[14]  Tilman Flock,et al.  Structured and disordered facets of the GPCR fold. , 2014, Current opinion in structural biology.

[15]  Richard Henderson,et al.  The Nobel Prize in Chemistry 2015. , 2015, Ukrainian biochemical journal.

[16]  Michael H. Böhlen,et al.  Visual Data Mining - Theory, Techniques and Tools for Visual Analytics , 2008, Visual Data Mining.

[17]  Øyvind Edvardsen,et al.  GPCRDB: information system for G protein-coupled receptors , 2010, Nucleic Acids Res..

[18]  A. Doré,et al.  Structure of class C GPCR metabotropic glutamate receptor 5 transmembrane domain , 2014, Nature.

[19]  Alfredo Vellido,et al.  Exploratory Visualization of Misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques , 2014, IWBBIO.

[20]  C Arús,et al.  Robust discrimination of glioblastomas from metastatic brain tumors on the basis of single‐voxel 1H MRS , 2012, NMR in biomedicine.

[21]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[22]  Jens Meiler,et al.  Structure of a Class C GPCR Metabotropic Glutamate Receptor 1 Bound to an Allosteric Modulator , 2014, Science.

[23]  Daniel A. Keim,et al.  Visual Analytics: Scope and Challenges , 2008, Visual Data Mining.

[24]  Alfredo Vellido,et al.  SVM-Based Classification of Class C GPCRs from Alignment-Free Physicochemical Transformations of Their Sequences , 2013, ICIAP Workshops.

[25]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[26]  Roberto Therón,et al.  Treevolution: visual analysis of phylogenetic trees , 2009, Bioinform..

[27]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[28]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.