Database 7 TMRmine : a Web server for hierarchical mining of 7 TMR proteins

Background: Seven-transmembrane region-containing receptors (7TMRs) play central roles in eukaryotic signal transduction. Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes has been an active target of bioinformatics and pharmacogenomics research. The need for new and accurate 7TMR/GPCR prediction tools is paramount with the accelerated rate of acquisition of diverse sequence information. Currently available and often used protein classification methods (e.g., profile hidden Markov Models) are highly accurate for identifying their membership information among already known 7TMR subfamilies. However, these alignment-based methods are less effective for identifying remote similarities, e.g., identifying proteins from highly divergent or possibly new 7TMR families. In this regard, more sensitive (e.g., alignment-free) methods are needed to complement the existing protein classification methods. A better strategy would be to combine different classifiers, from more specific to more sensitive methods, to identify a broader spectrum of 7TMR protein candidates. Description: We developed a Web server, 7TMRmine, by integrating alignment-free and alignment-based classifiers specifically trained to identify candidate 7TMR proteins as well as transmembrane (TM) prediction methods. This new tool enables researchers to easily assess the distribution of GPCR functionality in diverse genomes or individual newly-discovered proteins. 7TMRmine is easily customized and facilitates exploratory analysis of diverse genomes. Users can integrate various alignment-based, alignment-free, and TM-prediction methods in any combination and in any hierarchical order. Sixteen classifiers (including two TM-prediction methods) are available on the 7TMRmine Web server. Not only can the 7TMRmine tool be used for 7TMR mining, but also for general TM-protein analysis. Users can submit protein sequences for analysis, or explore pre-analyzed results for multiple genomes. The server currently includes prediction results and the summary statistics for 68 genomes. Conclusion: 7TMRmine facilitates the discovery of 7TMR proteins. By combining prediction results from different classifiers in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. 7TMRmine can be also used as a general TM-protein classifier. Comparisons of TM and 7TMR protein distributions among 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla. Published: 19 June 2009 BMC Genomics 2009, 10:275 doi:10.1186/1471-2164-10-275 Received: 8 January 2009 Accepted: 19 June 2009 This article is available from: http://www.biomedcentral.com/1471-2164/10/275 © 2009 Lu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BMC Genomics 2009, 10:275 http://www.biomedcentral.com/1471-2164/10/275

[1]  S. Assmann,et al.  Two Novel GPCR-Type G Proteins Are Abscisic Acid Receptors in Arabidopsis , 2009, Cell.

[2]  Alan M. Jones,et al.  d‐Glucose sensing by a plasma membrane regulator of G signaling protein, AtRGS1 , 2008, FEBS letters.

[3]  James H. Thomas,et al.  The Caenorhabditis chemoreceptor gene families , 2008, BMC Biology.

[4]  Y. Maeda,et al.  GPHR is a novel anion channel critical for acidification and functions of the Golgi apparatus , 2008, Nature Cell Biology.

[5]  D. Moss,et al.  GPCRTree: online hierarchical classification of GPCR function , 2008, BMC Research Notes.

[6]  Sarah M Assmann,et al.  Whole proteome identification of plant candidate G-protein coupled receptors in Arabidopsis, rice, and poplar: computational prediction and in-vivo protein coupling , 2008, Genome Biology.

[7]  Regine Heller,et al.  Drosophila odorant receptors are both ligand-gated and cyclic-nucleotide-activated cation channels , 2008, Nature.

[8]  Leslie B. Vosshall,et al.  Insect olfactory receptors are heteromeric ligand-gated ion channels , 2008, Nature.

[9]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[10]  Etsuko N Moriyama,et al.  Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors. , 2007, Genomics.

[11]  Etsuko N Moriyama,et al.  Protein family classification with partial least squares. , 2007, Journal of proteome research.

[12]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[13]  Zheng-Zhi Wang,et al.  Classification of G-protein coupled receptors at four levels. , 2006, Protein engineering, design & selection : PEDS.

[14]  Alan M. Jones,et al.  Mining the Arabidopsis thaliana genome for highly-divergent seven transmembrane receptors , 2006, Genome Biology.

[15]  David E. Gloriam,et al.  Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse. , 2006, Genomics.

[16]  Lukas Käll,et al.  A general model of G protein‐coupled receptor sequences and its application to detect remote homologs , 2006, Protein science : a publication of the Protein Society.

[17]  Silke Sachse,et al.  Atypical Membrane Topology and Heteromeric Function of Drosophila Odorant Receptors In Vivo , 2006, PLoS biology.

[18]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[19]  Junhyong Kim,et al.  Protein family classification with discriminant function analysis , 2005 .

[20]  H. Goodman,et al.  A novel gene family in Arabidopsis encoding putative heptahelical transmembrane proteins homologous to human adiponectin receptors and progestin receptors. , 2005 .

[21]  Gajendra P. S. Raghava,et al.  GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors , 2005, Nucleic Acids Res..

[22]  D. Doyle,et al.  Transmembrane helix prediction: a comparative evaluation and analysis. , 2005, Protein engineering, design & selection : PEDS.

[23]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[24]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[25]  Sarah M Assmann,et al.  The Arabidopsis Putative G Protein–Coupled Receptor GCR1 Interacts with the G Protein α Subunit GPA1 and Regulates Abscisic Acid Signaling , 2004, The Plant Cell Online.

[26]  Sarah M Assmann,et al.  Plants: the latest model system for G‐protein research , 2004, EMBO reports.

[27]  A. Kernytsky,et al.  Transmembrane helix predictions revisited , 2002, Protein science : a publication of the Protein Society.

[28]  Fabienne Thomarat,et al.  Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi , 2001, Nature.

[29]  B. Rost,et al.  Comparing function and structure between entire proteomes , 2001, Protein science : a publication of the Protein Society.

[30]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[31]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[32]  Etsuko N. Moriyama,et al.  Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties , 2000, Bioinform..

[33]  T. Stevens,et al.  Do more complex organisms have a greater proportion of membrane proteins in their genomes? , 2000, Proteins.

[34]  J. Carlson,et al.  Candidate taste receptors in Drosophila. , 2000, Science.

[35]  G. von Heijne,et al.  Topology, Subcellular Localization, and Sequence Diversity of the Mlo Family in Plants* , 1999, The Journal of Biological Chemistry.

[36]  John R. Carlson,et al.  A Novel Family of Divergent Seven-Transmembrane Proteins Candidate Odorant Receptors in Drosophila , 1999, Neuron.

[37]  Cori Bargmann Neurobiology of the Caenorhabditis elegans genome. , 1998, Science.

[38]  G. Tusnády,et al.  Principles governing amino acid composition of integral membrane proteins: application to topology prediction. , 1998, Journal of molecular biology.

[39]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[40]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[41]  Anders Krogh,et al.  Hidden Markov models for sequence analysis: extension and analysis of the basic method , 1996, Comput. Appl. Biosci..

[42]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[43]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[44]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[45]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[46]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[47]  F. Cohen,et al.  Molecular Phylogeny and Evolution of the Plant-Specific Seven-Transmembrane MLO Family , 2002, Journal of Molecular Evolution.

[48]  Wen Huang,et al.  The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant , 2001, Nucleic Acids Res..

[49]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[50]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .