COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming

In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position–specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα‐Cα atoms. First, using a rigorous leave‐one‐protein‐out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state‐of‐the‐art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/. Proteins 2016; 84:332–348. © 2016 Wiley Periodicals, Inc.

[1]  Burkhard Rost,et al.  PROFcon: novel prediction of long-range contacts , 2005, Bioinform..

[2]  A. Arseniev,et al.  Structural insights into the proton pumping by unusual proteorhodopsin from nonmarine bacteria , 2013, Proceedings of the National Academy of Sciences.

[3]  J. Bowie Solving the membrane protein folding problem , 2005, Nature.

[4]  Christodoulos A Floudas,et al.  Alpha-helical topology prediction and generation of distance restraints in membrane proteins. , 2008, Biophysical journal.

[5]  Paul A. Wiggins,et al.  Emerging roles for lipids in shaping membrane-protein function , 2009, Nature.

[6]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[7]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[8]  M. Hong,et al.  Membrane protein structure and dynamics from NMR spectroscopy. , 2012, Annual review of physical chemistry.

[9]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[10]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[11]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[12]  C. Dobson Protein folding and misfolding , 2003, Nature.

[13]  Zsuzsanna Dosztányi,et al.  Transmembrane proteins in the Protein Data Bank: identification and classification , 2004, Bioinform..

[14]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[15]  Catherine Etchebest,et al.  Determining membrane protein structures: still a challenge! , 2007, Trends in biochemical sciences.

[16]  Wen-Lian Hsu,et al.  Predicting helix–helix interactions from residue contacts in membrane proteins , 2009, Bioinform..

[17]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[18]  C. Robinson,et al.  Membrane proteins bind lipids selectively to modulate their structure and function , 2014, Nature.

[19]  C. Floudas,et al.  Towards accurate residue–residue hydrophobic contact prediction for α helical proteins via integer linear optimization , 2009, Proteins.

[20]  Jie Liang,et al.  Helix-helix packing and interfacial pairwise interactions of residues in membrane proteins. , 2001, Journal of molecular biology.

[21]  Gunnar von Heijne,et al.  Mechanisms of integral membrane protein insertion and folding. , 2015, Journal of molecular biology.

[22]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[23]  M. Rask-Andersen,et al.  The druggable genome: Evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication. , 2014, Annual review of pharmacology and toxicology.

[24]  Christodoulos A. Floudas,et al.  CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[25]  C. Floudas,et al.  Contact prediction for beta and alpha‐beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO‐FOLD , 2010, Proteins.

[26]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  J. Javitch,et al.  A mechanism for intracellular release of Na+ by neurotransmitter/sodium symporters , 2014, Nature Structural &Molecular Biology.

[28]  W. Kwiatkowski,et al.  Membrane domain structures of three classes of histidine kinase receptors by cell-free expression and rapid NMR analysis , 2010, Proceedings of the National Academy of Sciences.

[29]  Sven Griep,et al.  PDBselect 1992–2009 and PDBfilter-select , 2009, Nucleic Acids Res..

[30]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[31]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[32]  D Baker,et al.  Prediction of membrane protein structures with complex topologies using limited constraints , 2009, Proceedings of the National Academy of Sciences.

[33]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[34]  D. Frishman,et al.  Prediction of helix–helix contacts and interacting helices in polytopic membrane proteins using neural networks , 2009, Proteins.

[35]  Dmitrij Frishman,et al.  Co-evolving residues in membrane proteins , 2007, Bioinform..

[36]  Anna Tramontano,et al.  Evaluation of residue–residue contact prediction in CASP10 , 2014, Proteins.

[37]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[38]  C A Floudas,et al.  Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins. , 2011, Chemical engineering science.

[39]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[40]  Andreas Engel,et al.  Structure and mechanics of membrane proteins. , 2008, Annual review of biochemistry.

[41]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[42]  R. Kaufman,et al.  The impact of the unfolded protein response on human disease , 2012, The Journal of cell biology.

[43]  P. Raman,et al.  The Membrane Protein Data Bank , 2005, Cellular and Molecular Life Sciences.

[44]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[45]  Jiang Xie,et al.  CNNcon: Improved Protein Contact Maps Prediction Using Cascaded Neural Networks , 2013, PloS one.

[46]  Jie Liang,et al.  Higher-order interhelical spatial interactions in membrane proteins. , 2003, Journal of molecular biology.

[47]  Hong-Bin Shen,et al.  Improving accuracy of protein contact prediction using balanced network deconvolution , 2015, Proteins.

[48]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[49]  David T. Jones,et al.  Predicting Transmembrane Helix Packing Arrangements using Residue Contacts and a Force-Directed Algorithm , 2010, PLoS Comput. Biol..

[50]  Andrei V. Pisliakov,et al.  Structural insights into electron transfer in caa3-type cytochrome oxidase , 2012, Nature.

[51]  A. Goate,et al.  Alzheimer’s Disease Risk Genes and Mechanisms of Disease Pathogenesis , 2015, Biological Psychiatry.

[52]  Ole Lund,et al.  Using Sequence Motifs for Enhanced Neural Network Prediction of Protein Distance Constraints , 1999, ISMB.

[53]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[54]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[55]  C A Floudas,et al.  ASTRO-FOLD 2.0: an Enhanced Framework for Protein Structure Prediction. , 2012, AIChE journal. American Institute of Chemical Engineers.

[56]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.

[57]  M Michael Gromiha,et al.  Inter-residue interactions in protein folding and stability. , 2004, Progress in biophysics and molecular biology.

[58]  Zhiyong Wang,et al.  Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[59]  S H White,et al.  MPtopo: A database of membrane protein topology , 2001, Protein science : a publication of the Protein Society.

[60]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[61]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[62]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[63]  George A. Khoury,et al.  Protein folding and de novo protein design for biotechnological applications. , 2014, Trends in biotechnology.

[64]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[65]  L. Cybulski,et al.  Bilayer hydrophobic thickness and integral membrane protein function. , 2011, Current protein & peptide science.

[66]  A. Kukol Lipid membranes for membrane proteins. , 2015, Methods in molecular biology.

[67]  K. Kita,et al.  Crystal structure of mitochondrial quinol-fumarate reductase from the parasitic nematode Ascaris suum. , 2012, Journal of biochemistry.

[68]  Yiannis Kaznessis,et al.  Prediction of distant residue contacts with the use of evolutionary information , 2005, Proteins.

[69]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[70]  Yang Zhang,et al.  High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling , 2013, Bioinform..

[71]  Michael Lappe,et al.  Residue contact-count potentials are as effective as residue-residue contact-type potentials for ranking protein decoys , 2008, BMC Structural Biology.