The extracellular N-terminal domain suffices to discriminate class C G Protein-Coupled Receptor subtypes from n-grams of their sequences

The investigation of protein functionality often relies on the knowledge of crystal 3-D structure. This structure is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as G Protein-Coupled Receptors (GPCRs) and specially of those of class C, which are the target of the current study. In the absence of information about tertiary or quaternary structures, functionality can be investigated from the primary structure, that is, from the amino acid sequence. In previous research, we found that the different subtypes of class C GPCRs could be discriminated with a high level of accuracy from the n-gram transformation of their complete primary sequences, using a method that combined two-stage feature selection with kernel classifiers. This study aims at discovering whether subunits of the complete sequence retain such discrimination capabilities. We report experiments that show that the extracellular N-terminal domain of the receptor suffices to retain the classification accuracy of the complete sequence and that it does so using a reduced selection of n-grams whose length of up to five amino acids opens up an avenue for class C GPCR signature motif discovery.

[1]  Alfredo Vellido,et al.  Exploratory Visualization of Misclassified GPCRs from their transformed unaligned sequences using manifold learning techniques , 2014, IWBBIO.

[2]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[3]  Ugur Sezerman,et al.  Discrimination of thermophilic and mesophilic proteins using reduced amino acid alphabets with n-grams , 2012 .

[4]  R. Stevens,et al.  Structure-function of the G protein-coupled receptor superfamily. , 2013, Annual review of pharmacology and toxicology.

[5]  Jens Meiler,et al.  Structure of a Class C GPCR Metabotropic Glutamate Receptor 1 Bound to an Allosteric Modulator , 2014, Science.

[6]  Debora S. Marks,et al.  Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors , 2015, Nature Communications.

[7]  Judith Klein-Seetharaman,et al.  PROTEINS: Structure, Function, and Bioinformatics 58:955–970 (2005) Protein Classification Based on Text Document Classification Techniques , 2022 .

[8]  S. Katebi,et al.  Protein Superfamily Classification Using Fuzzy Rule-Based Classifier , 2009, IEEE Transactions on NanoBioscience.

[9]  Paulo J. Azevedo,et al.  Evaluating deterministic motif significance measures in protein databases , 2007, Algorithms for Molecular Biology.

[10]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[11]  Yücel Saygin,et al.  Classification of GPCRs Using Family Specific Motifs , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[13]  M. Rask-Andersen,et al.  Trends in the exploitation of novel drug targets , 2011, Nature Reviews Drug Discovery.

[14]  Abas Md Said,et al.  Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics , 2014, TheScientificWorldJournal.

[15]  Ajay K. Royyuru,et al.  Systematic and Fully Automated Identification of Protein Sequence Patterns , 2000, J. Comput. Biol..

[16]  A. Doré,et al.  Structure of class C GPCR metabotropic glutamate receptor 5 transmembrane domain , 2014, Nature.

[17]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[18]  Jiuwen Cao,et al.  Protein Sequence Classification with Improved Extreme Learning Machine Algorithms , 2014, BioMed research international.

[19]  L. Prézeau,et al.  Dimers and beyond: The functional puzzles of class C GPCRs. , 2011, Pharmacology & therapeutics.

[20]  Alfredo Vellido,et al.  Visual interpretation of class C GPCR subtype overlapping from the nonlinear mapping of transformed primary sequences , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[21]  Hasan H. Otu,et al.  Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets , 2010, BMC Bioinformatics.

[22]  Alfredo Vellido,et al.  Finding Class C GPCR Subtype-Discriminating N-grams through Feature Selection , 2014, PACBB.

[23]  J. Kittler,et al.  Feature Set Search Alborithms , 1978 .

[24]  Alfredo Vellido,et al.  The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors , 2014, Medical & Biological Engineering & Computing.

[25]  L. Prézeau,et al.  Evolution, structure, and activation mechanism of family 3/C G-protein-coupled receptors. , 2003, Pharmacology & therapeutics.

[26]  Cornelia Caragea,et al.  Protein Sequence Classification Using Feature Hashing , 2011, BIBM.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[29]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .