Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta

We describe CASP11 de novo blind structure predictions made using the Rosetta structure prediction methodology with both automatic and human assisted protocols. Model accuracy was generally improved using coevolution derived residue–residue contact information as restraints during Rosetta conformational sampling and refinement, particularly when the number of sequences in the family was more than three times the length of the protein. The highlight was the human assisted prediction of T0806, a large and topologically complex target with no homologs of known structure, which had unprecedented accuracy—<3.0 Å root‐mean‐square deviation (RMSD) from the crystal structure over 223 residues. For this target, we increased the amount of conformational sampling over our fully automated method by employing an iterative hybridization protocol. Our results clearly demonstrate, in a blind prediction scenario, that coevolution derived contacts can considerably increase the accuracy of template‐free structure modeling. Proteins 2016; 84(Suppl 1):67–75. © 2015 Wiley Periodicals, Inc.

[1]  R. Doolittle Of urfs and orfs : a primer on how to analyze devised amino acid sequences , 1986 .

[2]  W. C. Barker Of URFs and ORFs: A primer on how to analyze derived amino acid sequences: Russell F. Doolittle, University Science Books, Mill Valley, CA. Paperback. Under $15 , 1987 .

[3]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[4]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[5]  Arne Elofsson,et al.  MaxSub: an automated measure for the assessment of protein structure prediction quality , 2000, Bioinform..

[6]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[7]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[8]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[9]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[10]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[11]  Hongyi Zhou,et al.  Fold recognition by combining sequence profiles derived from evolution and from depth‐dependent structural alignment of fragments , 2004, Proteins.

[12]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[13]  Lars Malmström,et al.  PROTEINS: Structure, Function, and Bioinformatics Suppl 7:193–200 (2005) Automated Prediction of Domain Boundaries in CASP6 Targets Using Ginzu and RosettaDOM , 2022 .

[14]  D. Baker,et al.  Prediction of structures of multidomain proteins from structures of the individual domains , 2006, Protein science : a publication of the Protein Society.

[15]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[16]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[17]  Philip A. Romero,et al.  Exploring protein fitness landscapes by directed evolution , 2009, Nature Reviews Molecular Cell Biology.

[18]  David Kim,et al.  Structure prediction for CASP8 with all‐atom refinement using Rosetta , 2009, Proteins.

[19]  Yang Zhang,et al.  How significant is a protein structure similarity with TM-score = 0.5? , 2010, Bioinform..

[20]  J. Skolnick,et al.  GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. , 2011, Biophysical journal.

[21]  Yaoqi Zhou,et al.  Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates , 2011, Bioinform..

[22]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[23]  David Baker,et al.  Incorporation of evolutionary information into Rosetta comparative modeling , 2011, Proteins.

[24]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[25]  Daniel W. Kulp,et al.  Generalized Fragment Picking in Rosetta: Design, Protocols and Applications , 2011, PloS one.

[26]  Anna Tramontano,et al.  Evaluation of model quality predictions in CASP9 , 2011, Proteins.

[27]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[28]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[29]  Björn Wallner,et al.  Improved model quality assessment using ProQ2 , 2012, BMC Bioinformatics.

[30]  Jian Peng,et al.  Template-based protein structure modeling using the RaptorX web server , 2012, Nature Protocols.

[31]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[32]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[33]  David Baker,et al.  High-resolution comparative modeling with RosettaCM. , 2013, Structure.

[34]  Anna Tramontano,et al.  Assessment of the assessment: Evaluation of the model quality estimates in CASP10 , 2014, Proteins.

[35]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[36]  D. Baker,et al.  Relaxation of backbone bond geometry improves protein energy landscape modeling , 2014, Protein science : a publication of the Protein Society.

[37]  P. Bork,et al.  Tara Oceans. Tara Oceans studies plankton at planetary scale. Introduction. , 2015, Science.

[38]  David E. Kim,et al.  Large-scale determination of previously unsolved protein structures using evolutionary information , 2015, eLife.

[39]  P. Bork,et al.  Tara Oceans studies plankton at planetary scale , 2015, Science.

[40]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[41]  Peer Bork,et al.  Computational eco-systems biology in Tara Oceans: translating data into knowledge , 2015, Molecular systems biology.