Complementing Kernel-Based Visualization of Protein Sequences with Their Phylogenetic Tree

The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. This dependency brings about the challenge of finding robust methods to analyze the complex data they generate. In this brief paper, we focus on the analysis of a specific type of proteins, the G protein-couple receptors, which are the target for over 15% of current drugs. We describe a kernel method of the manifold learning family for the analysis and intuitive visualization of their protein amino acid symbolic sequences. This method is shown to reveal the grouping structure of the sequences in a way that closely resembles the corresponding phylogenetic trees.

[1]  Fabrice Rossi,et al.  A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph , 2007 .

[2]  J. Felsenstein Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. , 1996, Methods in enzymology.

[3]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[4]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[5]  L. Prézeau,et al.  The complexity of their activation mechanism opens new possibilities for the modulation of mGlu and GABAB class C G protein-coupled receptors , 2011, Neuropharmacology.

[6]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[7]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Paulo J. G. Lisboa,et al.  Data Mining in Cancer Research [Application Notes] , 2010, IEEE Computational Intelligence Magazine.

[10]  Alfredo Vellido,et al.  Kernel generative topographic mapping , 2010, ESANN.

[11]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[12]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[13]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[14]  Alfredo Vellido,et al.  A probabilistic approach to the visual exploration of G Protein-Coupled Receptor sequences , 2011, ESANN.

[15]  Scott D. Kahn On the Future of Genomic Data , 2011, Science.

[16]  Paulo J. G. Lisboa,et al.  A review of evidence of health benefit from artificial neural networks in medical intervention , 2002, Neural Networks.

[17]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[18]  Yücel Saygin,et al.  Classification of GPCRs Using Family Specific Motifs , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.