Modeling structurally variable regions in homologous proteins with rosetta

A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three‐ and nine‐residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side‐chains to iteratively optimize backbone torsion angles and side‐chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 Å are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3Å root‐mean‐square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39‐residue insertion into a TIM barrel in CASP 5 target T0186. Proteins 2004. © 2004 Wiley‐Liss, Inc.

[1]  N. Go,et al.  Ring Closure and Local Conformational Deformations of Chain Molecules , 1970 .

[2]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[3]  K. Takano ON SOLUTION OF , 1983 .

[4]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[5]  J. Moult,et al.  An algorithm for determining the conformation of polypeptide segments in proteins by systematic search , 1986, Proteins.

[6]  C. Levinthal,et al.  Predicting antibody hypervariable loop conformation. I. Ensembles of random conformations for ringlike structures , 1987, Biopolymers.

[7]  H. Scheraga,et al.  Monte Carlo-minimization approach to the multiple-minima problem in protein folding. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Karplus,et al.  Prediction of the folding of short polypeptide segments by uniform conformational sampling , 1987, Biopolymers.

[9]  A C Martin,et al.  Modeling antibody hypervariable loops: a combined algorithm. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[10]  B. L. Sibanda,et al.  Conformation of beta-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. , 1989, Journal of molecular biology.

[11]  M. Karplus,et al.  Conformational sampling using high‐temperature molecular dynamics , 1990, Biopolymers.

[12]  P. Kraulis A program to produce both detailed and schematic plots of protein structures , 1991 .

[13]  K Aisaka,et al.  Modeling the anti‐CEA antibody combining site by homology and conformational search , 1992, Proteins.

[14]  William H. Press,et al.  Numerical recipes in Fortran 77 : the art of scientificcomputing. , 1992 .

[15]  William H. Press,et al.  Numerical Recipes in Fortran 77: The Art of Scientific Computing 2nd Editionn - Volume 1 of Fortran Numerical Recipes , 1992 .

[16]  K. Fidelis,et al.  Comparison of systematic search and database methods for constructing segments of protein structure. , 1994, Protein engineering.

[17]  S. Sudarsanam,et al.  Modeling protein loops using a ϕi+1, Ψi dimer database , 1995, Protein science : a publication of the Protein Society.

[18]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[19]  Andrew J. Martin,et al.  Structural families in loops of homologous proteins: automatic classification, modelling and application to antibodies. , 1996, Journal of molecular biology.

[20]  M. Billeter,et al.  MOLMOL: a program for display and analysis of macromolecular structures. , 1996, Journal of molecular graphics.

[21]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[22]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[23]  Roland L. Dunbrack,et al.  Bayesian statistical analysis of protein side‐chain rotamer preferences , 1997, Protein science : a publication of the Protein Society.

[24]  T. Blundell,et al.  Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling. , 1997, Journal of molecular biology.

[25]  M. Karplus,et al.  PDB-based protein loop prediction: parameters for selection and methods for optimization. , 1997, Journal of molecular biology.

[26]  Baldomero Oliva,et al.  An automated classification of the structure of protein loops. , 1997, Journal of molecular biology.

[27]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Wójcik,et al.  New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. , 1999, Journal of molecular biology.

[29]  M. Karplus,et al.  Effective energy function for proteins in solution , 1999, Proteins.

[30]  H. Scheraga,et al.  Exact analytical loop closure in proteins using polynomial equations , 1999 .

[31]  D. Baker,et al.  Improved recognition of native‐like protein structures using a combination of sequence‐dependent and sequence‐independent features of proteins , 1999, Proteins.

[32]  R A Friesner,et al.  Prediction of loop geometries using a generalized born model of solvation effects , 1999, Proteins.

[33]  C. Deane,et al.  A novel exhaustive search algorithm for predicting the conformation of polypeptide segments in proteins , 2000, Proteins.

[34]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[35]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[36]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[37]  D. Baker,et al.  De novo protein structure determination using sparse NMR data , 2000, Journal of biomolecular NMR.

[38]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[39]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[40]  C M Deane,et al.  Improved protein loop prediction from sequence alone. , 2001, Protein engineering.

[41]  G R Marshall,et al.  Ab initio modeling of small, medium, and large loops in proteins. , 2001, Biopolymers.

[42]  R Leplae,et al.  Analysis and assessment of comparative modeling predictions in CASP4 , 2001, Proteins.

[43]  C. Deane,et al.  CODA: A combined algorithm for predicting the structurally variable regions of protein models , 2001, Protein science : a publication of the Protein Society.

[44]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[45]  Cinque S. Soto,et al.  Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[46]  D. Baker,et al.  An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. , 2003, Journal of molecular biology.

[47]  Jens Meiler,et al.  Rosetta predictions in CASP5: Successes, failures, and prospects for complete automation , 2003, Proteins.

[48]  Viktor Hornak,et al.  Generation of accurate protein loop conformations through low‐barrier molecular dynamics , 2003, Proteins.

[49]  Lars Malmström,et al.  Automated prediction of CASP‐5 structures using the Robetta server , 2003, Proteins.

[50]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Efficient generation of accurate, representative ensembles , 2003, Proteins.

[51]  M. DePristo,et al.  Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all‐atom statistical potential and the AMBER force field with the Generalized Born solvation model , 2003, Proteins.