Towards accurate residue–residue hydrophobic contact prediction for α helical proteins via integer linear optimization

A new optimization‐based method is presented to predict the hydrophobic residue contacts in α‐helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent α‐helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was ∼66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 and 14.67 Å, respectively. Proteins 2009. © 2008 Wiley‐Liss, Inc.

[1]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[4]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[5]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[6]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[7]  J. Skolnick,et al.  Tertiary structure prediction of the KIX domain of CBP using Monte Carlo simulations driven by restraints derived from multiple sequence alignments , 1998, Proteins.

[8]  A Kolinski,et al.  Nativelike topology assembly of small proteins using predicted restraints in Monte Carlo folding simulations. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[9]  J. Skolnick,et al.  Fold assembly of small proteins using monte carlo simulations driven by restraints derived from multiple sequence alignments. , 1998, Journal of molecular biology.

[10]  R. Casadio,et al.  A neural network based predictor of residue contacts in proteins. , 1999, Protein engineering.

[11]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[12]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[13]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[14]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[15]  P Fariselli,et al.  Progress in predicting inter‐residue contacts of proteins with neural networks and correlated mutations , 2001, Proteins.

[16]  G. Vriend,et al.  Prediction of protein residue contacts with a PDB-derived likelihood matrix. , 2002, Protein engineering.

[17]  Richard Bonneau,et al.  Contact order and ab initio protein structure prediction , 2002, Protein science : a publication of the Protein Society.

[18]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[19]  Christopher Bystroff,et al.  Predicting interresidue contacts using templates and pathways , 2003, Proteins.

[20]  P. Lyu,et al.  Relationship between protein structures and disulfide‐bonding patterns , 2003, Proteins.

[21]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[22]  George Karypis,et al.  Prediction of contact maps using support vector machines , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[23]  K. Burrage,et al.  Protein contact prediction using patterns of correlation , 2004, Proteins.

[24]  De-Shuang Huang,et al.  Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme , 2004, J. Comput. Aided Mol. Des..

[25]  Robert M. MacCallum,et al.  Striped sheets and protein contact prediction , 2004, ISMB/ECCB.

[26]  Alessandro Vullo,et al.  A two-stage approach for improved prediction of residue contact maps , 2006, BMC Bioinformatics.

[27]  Yiannis Kaznessis,et al.  Prediction of distant residue contacts with the use of evolutionary information , 2005, Proteins.

[28]  Burkhard Rost,et al.  PROFcon: novel prediction of long-range contacts , 2005, Bioinform..

[29]  Jenn-Kang Hwang,et al.  Prediction of disulfide connectivity from protein sequences , 2005, Proteins.

[30]  Jens Meiler,et al.  CASP6 assessment of contact prediction , 2005, Proteins.

[31]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[32]  Emil Alexov,et al.  Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives , 2006, BMC Bioinformatics.

[33]  Christodoulos A. Floudas,et al.  Advances in protein structure prediction and de novo protein design : A review , 2006 .

[34]  Pierre Baldi,et al.  A machine learning information retrieval approach to protein fold recognition. , 2006, Bioinformatics.

[35]  C. Floudas,et al.  Novel approach for α‐helical topology prediction in globular proteins: Generation of interhelical restraints , 2006 .

[36]  Pierre Baldi,et al.  Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching , 2005, Proteins.

[37]  Christodoulos A. Floudas,et al.  A novel high resolution CαCα distance dependent force field based on a high quality decoy set , 2006 .

[38]  Christodoulos A. Floudas,et al.  Alpha-helical topology and tertiary structure prediction in globular proteins , 2007, 2007 46th IEEE Conference on Decision and Control.

[39]  C A Floudas,et al.  Computational methods in protein structure prediction. , 2007, Biotechnology and bioengineering.

[40]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[41]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[42]  András Fiser,et al.  Predicting disulfide bond connectivity in proteins by correlated mutations analysis , 2008, Bioinform..

[43]  Graziano Pesole,et al.  Correlated substitution analysis and the prediction of amino acid structural contacts , 2007, Briefings Bioinform..

[44]  Y. Kaznessis,et al.  Separating true positive predicted residue contacts from false positive ones in mainly α proteins, using constrained Metropolis MC simulations , 2007, Proteins.