Efficient methods for filtering and ranking fragments for the prediction of structurally variable regions in proteins

The prediction of protein 3D structures close to insertions and deletions or, more generally, loop prediction, is still one of the major challenges in homology modeling projects. In this article, we developed ranking criteria and selection filters to improve knowledge‐based loop predictions. These criteria were developed and optimized for a test data set containing 678 insertions and deletions. The examples are, in principle, predictable from the used loop database with an RMSD < 1 Å and represent realistic modeling situations. Four noncorrelated criteria for the selection of fragments are evaluated. A fast prefilter compares the distance between the anchor groups in the template protein with the stems of the fragments. The RMSD of the anchor groups is used for fitting and ranking of the selected loop candidates. After fitting, repulsive close contacts of loop candidates with the template protein are used for filtering, and fragments with backbone torsion angles, which are unfavorable according to a knowledge‐based potential, are eliminated. By the combined application of these filter criteria to the test set, it was possible to increase the percentage of predictions with a global RMSD < 1 Å to over 50% among the first five ranks, with average global RMSD values for the first rank candidate that are between 1.3 and 2.2 Å for different loop lengths. Compared to other examples described in the literature, our large numbers of test cases are not self‐predictions, where loops are placed in a protein after a peptide loop has been cut out, but are attempts to predict structural changes that occur in evolution when a protein is affected by insertions and deletions. Proteins 2004;54:000–000. © 2003 Wiley‐Liss, Inc.

[1]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[2]  J. Wójcik,et al.  New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. , 1999, Journal of molecular biology.

[3]  Cinque S. Soto,et al.  Evaluating conformational free energies: The colony energy and its application to the problem of loop prediction , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  D. Schomburg,et al.  Positioning of anchor groups in protein loop prediction: The importance of solvent accessibility and secondary structure elements , 2002, Proteins.

[5]  C. Sander,et al.  Verification of protein structures : Side-chain planarity , 1996 .

[6]  J. Moult,et al.  An algorithm for determining the conformation of polypeptide segments in proteins by systematic search , 1986, Proteins.

[7]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[8]  G. Gonnet,et al.  Empirical and structural models for insertions and deletions in the divergent evolution of proteins. , 1993, Journal of molecular biology.

[9]  D Schomburg,et al.  Amino acid similarity coefficients for protein modeling and sequence alignment derived from main-chain folding angles. , 1991, Journal of molecular biology.

[10]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[11]  R. Diamond A note on the rotational superposition problem , 1988 .

[12]  C. Deane,et al.  CODA: A combined algorithm for predicting the structurally variable regions of protein models , 2001, Protein science : a publication of the Protein Society.

[13]  A. Lesk,et al.  Assessment of novel fold targets in CASP4: Predictions of three‐dimensional structures, secondary structures, and interresidue contacts , 2001, Proteins.

[14]  C M Deane,et al.  Improved protein loop prediction from sequence alone. , 2001, Protein engineering.

[15]  M. Karplus,et al.  Prediction of the folding of short polypeptide segments by uniform conformational sampling , 1987, Biopolymers.

[16]  H M Berman,et al.  Protein-DNA interactions: A structural analysis. , 1999, Journal of molecular biology.

[17]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[18]  J. Fetrow Omega loops; nonregular secondary structures significant in protein function and stability , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[19]  R Leplae,et al.  Analysis and assessment of comparative modeling predictions in CASP4 , 2001, Proteins.

[20]  R. Samudrala,et al.  An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. , 1998, Journal of molecular biology.

[21]  M J Sippl,et al.  Assessment of the CASP4 fold recognition category , 2001, Proteins.

[22]  U. Lessel,et al.  Importance of anchor group positioning in protein loop prediction , 1999, Proteins.

[23]  C. Deane,et al.  A novel exhaustive search algorithm for predicting the conformation of polypeptide segments in proteins , 2000, Proteins.

[24]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[25]  W Li,et al.  Exploring the conformational diversity of loops on conserved frameworks. , 1999, Protein engineering.

[26]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[27]  M. Karplus,et al.  PDB-based protein loop prediction: parameters for selection and methods for optimization. , 1997, Journal of molecular biology.

[28]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[29]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[30]  J. Bajorath,et al.  Comparison of an antibody model with an X‐ray structure: The variable fragment of BR96 , 1996, Proteins.

[31]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[32]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[33]  B Qian,et al.  Distribution of indel lengths , 2001, Proteins.

[34]  U. Lessel,et al.  Similarities between protein 3-D structures. , 1994, Protein engineering.

[35]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.