CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization

Most of the protein structure prediction methods use a multi-step process, which often includes secondary structure prediction, contact prediction, fragment generation, clustering, etc. For many years, secondary structure prediction has been the workhorse for numerous methods aimed at predicting protein structure and function. This paper presents a new mixed integer linear optimization (MILP)-based consensus method: a Consensus scheme based On a mixed integer liNear optimization method for seCOndary stRucture preDiction (CONCORD). Based on seven secondary structure prediction methods, SSpro, DSC, PROF, PROFphd, PSIPRED, Predator and GorIV, the MILP-based consensus method combines the strengths of different methods, maximizes the number of correctly predicted amino acids and achieves a better prediction accuracy. The method is shown to perform well compared with the seven individual methods when tested on the PDBselect25 training protein set using sixfold cross validation. It also performs well compared with another set of 10 online secondary structure prediction servers (including several recent ones) when tested on the CASP9 targets (http://predictioncenter.org/casp9/). The average Q3 prediction accuracy is 83.04 per cent for the sixfold cross validation of the PDBselect25 set and 82.3 per cent for the CASP9 targets. We have developed a MILP-based consensus method for protein secondary structure prediction. A web server, CONCORD, is available to the scientific community at http://helios.princeton.edu/CONCORD.

[1]  B. Robson,et al.  Analysis of the code relating sequence to conformation in proteins: possible implications for the mechanism of formation of helical regions. , 1971, Journal of molecular biology.

[2]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[3]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[4]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[5]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[6]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[7]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[8]  B. Rost,et al.  Secondary structure prediction of all-helical proteins in two states. , 1993, Protein engineering.

[9]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[10]  Christophe Geourjon,et al.  SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments , 1995, Comput. Appl. Biosci..

[11]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[12]  J. Gibrat,et al.  GOR method for predicting protein secondary structure from amino acid sequence. , 1996, Methods in enzymology.

[13]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[14]  P. Argos,et al.  Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. , 1996, Protein engineering.

[15]  J. M. Levin,et al.  Exploring the limits of nearest neighbour secondary structure prediction. , 1997, Protein engineering.

[16]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[17]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[18]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[19]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[20]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[21]  Adam Liwo,et al.  Efficient parallel algorithms in global optimization of potential energy functions for peptides, proteins, and crystals , 2000 .

[22]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[23]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[24]  A. Lesk,et al.  Assessment of novel fold targets in CASP4: Predictions of three‐dimensional structures, secondary structures, and interresidue contacts , 2001, Proteins.

[25]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[26]  John L. Klepeis,et al.  Ab initio prediction of helical segments in polypeptides , 2002, J. Comput. Chem..

[27]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[28]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[29]  R. Srinivasan,et al.  Ab initio prediction of protein structure using LINUS , 2002, Proteins.

[30]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[31]  Xiaohui Liu,et al.  Consensus clustering and functional interpretation of gene-expression data , 2004, Genome Biology.

[32]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[33]  John L. Klepeis,et al.  Prediction of β‐sheet topology and disulfide bridges in polypeptides , 2003, J. Comput. Chem..

[34]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[35]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[37]  Burkhard Rost,et al.  Rising Accuracy of Protein Secondary Structure Prediction , 2003 .

[38]  Arne Elofsson,et al.  3D-Jury: A Simple Approach to Improve Protein Structure Predictions , 2003, Bioinform..

[39]  Burkhard Rost,et al.  Improving fold recognition without folds. , 2004, Journal of molecular biology.

[40]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[41]  A. Kolinski Protein modeling and structure prediction with a reduced representation. , 2004, Acta biochimica Polonica.

[42]  Roland L Dunbrack,et al.  Assessment of fold recognition predictions in CASP6 , 2005, Proteins.

[43]  C. Floudas,et al.  Ab initio prediction of the three‐dimensional structure of a de novo designed protein: A double‐blind case study , 2004, Proteins.

[44]  B. Honig,et al.  Protein structure prediction: inroads to biology. , 2005, Molecular cell.

[45]  Taner Z Sen,et al.  Prediction of protein secondary structure by mining structural fragment database. , 2005, Polymer.

[46]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[47]  B. Rost How to Use Protein 1- D Structure Predicted by PROFphd , 2005 .

[48]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[49]  Ming Li,et al.  Consensus fold recognition by predicted model quality , 2005, APBC.

[50]  K. Ginalski Comparative modeling for protein structure prediction. , 2006, Current opinion in structural biology.

[51]  George Karypis,et al.  YASSPP: Better kernels and coding schemes lead to improvements in protein secondary structure prediction , 2006, Proteins.

[52]  Christodoulos A. Floudas,et al.  Advances in protein structure prediction and de novo protein design : A review , 2006 .

[53]  Roland L. Dunbrack Sequence comparison and protein structure prediction. , 2006, Current opinion in structural biology.

[54]  R. Kolodny,et al.  Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction. , 2006, Current opinion in structural biology.

[55]  L. Wray,et al.  Functional Analysis of the Carboxy-Terminal Region of Bacillus subtilis TnrA, a MerR Family Protein , 2006, Journal of bacteriology.

[56]  Srinivas Devadas,et al.  Learning biophysically-motivated parameters for alpha helix prediction , 2007, BMC Bioinformatics.

[57]  Haitao Cheng,et al.  Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM) , 2007, Bioinform..

[58]  Xin-Qiu Yao,et al.  A dynamic Bayesian network approach to protein secondary structure prediction , 2008, BMC Bioinformatics.

[59]  C A Floudas,et al.  Computational methods in protein structure prediction. , 2007, Biotechnology and bioengineering.

[60]  Alessandro Vullo,et al.  Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information , 2007, BMC Bioinformatics.

[61]  Sitao Wu,et al.  LOMETS: A local meta-threading-server for protein structure prediction , 2007, Nucleic acids research.

[62]  Adam Prügel-Bennett,et al.  An evolutionary method for learning HMM structure: prediction of protein secondary structure , 2007, BMC Bioinformatics.

[63]  Sangsoo Kim,et al.  CONSORF: a consensus prediction system for prokaryotic coding sequences , 2007, Bioinform..

[64]  Oliviero Carugo,et al.  Consensus Prediction of Protein Conformational Disorder from Amino Acidic Sequence , 2008, The open biochemistry journal.

[65]  Jianlin Cheng A multi-template combination algorithm for protein comparative modeling , 2008, BMC Structural Biology.

[66]  David S. Wishart,et al.  PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation , 2008, Nucleic Acids Res..

[67]  Yang Zhang Progress and challenges in protein structure prediction. , 2008, Current opinion in structural biology.

[68]  James R. Green,et al.  PCI-SS: MISO dynamic nonlinear protein secondary structure prediction , 2009, BMC Bioinformatics.

[69]  Jinbo Xu,et al.  Protein structure prediction using threading. , 2008, Methods in molecular biology.

[70]  Christodoulos A Floudas,et al.  Selecting high quality protein structures from diverse conformational ensembles. , 2009, Biophysical journal.

[71]  Jonathan D. Hirst,et al.  Prediction of backbone dihedral angles and protein secondary structure using support vector machines , 2009, BMC Bioinformatics.

[72]  Gianluca Pollastri,et al.  Beyond the Twilight Zone: Automated prediction of structural properties of proteins by recursive neural networks and remote homology information , 2009, Proteins.

[73]  Radhakrishnan Sabarinathan,et al.  CSSP (Consensus Secondary Structure Prediction): a web-based server for structural biologists , 2009 .

[74]  W. Hsu,et al.  Improving protein secondary structure prediction based on short subsequences with local structure similarity , 2010, BMC Genomics.

[75]  C. Floudas,et al.  Contact prediction for beta and alpha‐beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO‐FOLD , 2010, Proteins.

[76]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[77]  Martin Madera,et al.  Improving protein secondary structure prediction using a simple k-mer model , 2010, Bioinform..

[78]  Christodoulos A. Floudas,et al.  An improved hybrid global optimization method for protein tertiary structure prediction , 2010, Comput. Optim. Appl..