Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach

Integral membrane proteins constitute 25–30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp.

[1]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[2]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[3]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[4]  A. Valencia,et al.  Improving contact predictions by the combination of correlated mutations and other sources of sequence information. , 1997, Folding & design.

[5]  Arne Elofsson,et al.  Architecture of helix bundle membrane proteins: An analysis of cytochrome c oxidase from bovine mitochondria , 1997, Protein science : a publication of the Protein Society.

[6]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[7]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[8]  G. Heijne,et al.  Genome‐wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms , 1998, Protein science : a publication of the Protein Society.

[9]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[10]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[11]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[12]  T. Stevens,et al.  Substitution rates in alpha-helical transmembrane proteins. , 2001, Protein science : a publication of the Protein Society.

[13]  Tim J. Stevens,et al.  Substitution rates in α‐helical transmembrane proteins , 2001 .

[14]  István Simon,et al.  The HMMTOP transmembrane topology prediction server , 2001, Bioinform..

[15]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[16]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[17]  M. Gerstein,et al.  Genomic analysis of membrane protein families: abundance and conserved motifs , 2002, Genome Biology.

[18]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[19]  Christopher Bystroff,et al.  Predicting interresidue contacts using templates and pathways , 2003, Proteins.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  K. Burrage,et al.  Protein contact prediction using patterns of correlation , 2004, Proteins.

[22]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[23]  S. White The progress of membrane protein structure determination , 2004, Protein science : a publication of the Protein Society.

[24]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[25]  Alessandro Vullo,et al.  A two-stage approach for improved prediction of residue contact maps , 2006, BMC Bioinformatics.

[26]  Mei Liu,et al.  Prediction of protein-protein interactions using random decision forest framework , 2005, Bioinform..

[27]  Lars Malmström,et al.  Prediction of CASP6 structures using automated robetta protocols , 2005, Proteins.

[28]  Zsuzsanna Dosztányi,et al.  PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank , 2004, Nucleic Acids Res..

[29]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[30]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[31]  Jie Liang,et al.  Prediction of transmembrane helix orientation in polytopic membrane proteins , 2006, BMC Structural Biology.

[32]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[33]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[34]  Andrei L. Lomize,et al.  OPM: Orientations of Proteins in Membranes database , 2006, Bioinform..

[35]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[36]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[37]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[38]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[39]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[40]  Peng Chen,et al.  Prediction of Inter-residue Contact Clusters from Hydrophobic Cores , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[41]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[42]  Allison Doerr Membrane protein structures , 2008, Nature Methods.

[43]  Xiuzhen Zhang,et al.  Large-scale prediction of long disordered regions in proteins using random forests , 2009, BMC Bioinformatics.

[44]  István Simon,et al.  TOPDB: topology data bank of transmembrane proteins , 2007, Nucleic Acids Res..

[45]  James U Bowie,et al.  Structural imperatives impose diverse evolutionary constraints on helical membrane proteins , 2009, Proceedings of the National Academy of Sciences.

[46]  Dongsup Kim,et al.  A new method for revealing correlated mutations under the structural and functional constraints in proteins , 2009, Bioinform..

[47]  D. Frishman,et al.  Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks , 2009, Proteins.

[48]  Paolo Frasconi,et al.  Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights , 2009, Bioinform..

[49]  Torsten Schwede,et al.  The SWISS-MODEL Repository and associated resources , 2008, Nucleic Acids Res..

[50]  Bin Xue,et al.  Predicting residue–residue contact maps by a two‐layer, integrated neural‐network method , 2009, Proteins.

[51]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[52]  D Baker,et al.  Prediction of membrane protein structures with complex topologies using limited constraints , 2009, Proceedings of the National Academy of Sciences.

[53]  Wen-Lian Hsu,et al.  Predicting helix–helix interactions from residue contacts in membrane proteins , 2009, Bioinform..

[54]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.

[55]  David T. Jones,et al.  Predicting Transmembrane Helix Packing Arrangements using Residue Contacts and a Force-Directed Algorithm , 2010, PLoS Comput. Biol..

[56]  Mei Li,et al.  Structural insights into energy regulation of light-harvesting complex CP29 from spinach , 2011, Nature Structural &Molecular Biology.

[57]  R. Stevens,et al.  Structure of an Agonist-Bound Human A2A Adenosine Receptor , 2011, Science.