Advances in Semi-Supervised Alignment-Free Classication of G Protein-Coupled Receptors

G Protein-coupled receptors (GPCRs) are integral cell mem- brane proteins of great relevance for pharmacology due to their role in transducing extracellular signals. The 3-D structure is unknown for most of them, and the investigation of their structure-function relationships usually relies on the construction of 3-D receptor models from amino acid sequence alignment onto those receptors of known structure. Se- quence alignment risks the loss of relevant information. Different ap- proaches have attempted the analysis of alignment-free sequences on the basis of amino acid physicochemical properties. In this paper, we use the Auto-Cross Covariance method and compare it to an amino acid compo- sition representation. Novel semi-supervised manifold learning methods are then used to classify the several members of class C GPCRs on the basis of the transformed data. This approach is relevant because pro- tein sequences are not always labeled and methods that provide robust classification for a limited amount of labels are required.

[1]  B. Liu,et al.  Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection , 2012, PloS one.

[2]  A. Gilman,et al.  G proteins: transducers of receptor-generated signals. , 1987, Annual review of biochemistry.

[3]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[4]  L. Prézeau,et al.  Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. , 2003, Pharmacology & therapeutics.

[5]  K. Palczewski,et al.  Crystal Structure of Rhodopsin: A G‐Protein‐Coupled Receptor , 2002, Chembiochem : a European journal of chemical biology.

[6]  Alfred Ultsch,et al.  Label Propagation for Semi-Supervised Learning in Self-Organizing Maps , 2007 .

[7]  Bertil Hille G protein-coupled receptor , 2009, Scholarpedia.

[8]  Alfredo Vellido,et al.  Semi-supervised geodesic Generative Topographic Mapping , 2010, Pattern Recognit. Lett..

[9]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[10]  S. Wold,et al.  DNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures , 1993 .

[11]  Trevor Hastie,et al.  The elements of statistical learning. 2001 , 2001 .

[12]  R. Stevens,et al.  Structure-function of the G protein-coupled receptor superfamily. , 2013, Annual review of pharmacology and toxicology.

[13]  T. Lundstedt,et al.  Classification of G‐protein coupled receptors by alignment‐independent extraction of principal chemical properties of primary amino acid sequences , 2002, Protein science : a publication of the Protein Society.

[14]  Jian Huang,et al.  A Semi-supervised SVM for Manifold Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[16]  Takashi Nakayama,et al.  Alignment-Free Classification of G-Protein-Coupled Receptors Using Self-Organizing Maps , 2006, J. Chem. Inf. Model..

[17]  Alfredo Vellido,et al.  Semi-Supervised Analysis of Human Brain Tumours from Partially Labeled MRS Information, Using Manifold Learning Models , 2011, Int. J. Neural Syst..

[18]  S. Wold,et al.  New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. , 1998, Journal of medicinal chemistry.

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .