Positioning of anchor groups in protein loop prediction: The importance of solvent accessibility and secondary structure elements

The prediction of loop regions in the process of protein structure prediction by homology is still an unsolved problem. In an earlier publication, we could show that the correct placement of the amino acids serving as an anchor group to be connected by a loop fragment with a predicted geometry is a highly important step and an essential requirement within the process (Lessel and Schomburg, Proteins 1999;37:56–64 ). In this article, we present an analysis of the quality of possible loop predictions with respect to gap length, fragment length, amino acid type, secondary structure, and solvent accessibility. For 550 insertions and 544 deletions, we test all possible positions for anchor groups with an inserted loop of a length between 3 and 12 amino acids. We could show that approximately 80% of the indel regions could be predicted within 1.5 Å RMSD from a knowledge‐based loop data base if criteria for the correct localization of anchor groups could be found and the loops can be sorted correctly. From our analysis, several conclusions regarding the optimal placement of anchor groups become obvious: (1) The correct placement of anchor groups is even more important for longer gap lengths, (2) medium length fragments (length 5–8) perform better than short or long ones, (3) the placement of anchor groups at hydrophobic amino acids gives a higher chance to include the best possible loop, (4) anchor groups within secondary structure elements, in particular β‐sheets are suitable, (5) amino acids with lower solvent accessibility are better anchor group. A preliminary test using a combination of the anchor group positioning criteria deduced from our analysis shows very promising results. Proteins 2002;47:370–378. © 2002 Wiley‐Liss, Inc.

[1]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.

[2]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[3]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[4]  J. Moult,et al.  An algorithm for determining the conformation of polypeptide segments in proteins by systematic search , 1986, Proteins.

[5]  T. Blundell,et al.  Knowledge based modelling of homologous proteins, Part I: Three-dimensional frameworks derived from the simultaneous superposition of multiple structures. , 1987, Protein engineering.

[6]  M. Karplus,et al.  Prediction of the folding of short polypeptide segments by uniform conformational sampling , 1987, Biopolymers.

[7]  R. Diamond A note on the rotational superposition problem , 1988 .

[8]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[9]  A C Martin,et al.  Modeling antibody hypervariable loops: a combined algorithm. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[10]  S. Wodak,et al.  Modelling the polypeptide backbone with 'spare parts' from known protein structures. , 1989, Protein engineering.

[11]  B. L. Sibanda,et al.  Conformation of beta-hairpins in protein structures. A systematic classification with applications to modelling by homology, electron density fitting and protein engineering. , 1989, Journal of molecular biology.

[12]  M J Rooman,et al.  Automatic definition of recurrent local structure motifs in proteins. , 1990, Journal of molecular biology.

[13]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[14]  U. Lessel,et al.  Similarities between protein 3-D structures. , 1994, Protein engineering.

[15]  D. Schomburg,et al.  Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. , 1995, Journal of molecular biology.

[16]  S. Sudarsanam,et al.  Modeling protein loops using a ϕi+1, Ψi dimer database , 1995, Protein science : a publication of the Protein Society.

[17]  J. Fetrow Omega loops; nonregular secondary structures significant in protein function and stability , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[18]  T. Blundell,et al.  Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: A database for modeling and prediction , 1996, Protein science : a publication of the Protein Society.

[19]  U. Lessel,et al.  Creation and characterization of a new, non-redundant fragment data bank. , 1997, Protein engineering.

[20]  T. Blundell,et al.  Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling. , 1997, Journal of molecular biology.

[21]  M. Karplus,et al.  PDB-based protein loop prediction: parameters for selection and methods for optimization. , 1997, Journal of molecular biology.

[22]  Charlotte M. Deane,et al.  JOY: protein sequence-structure representation and analysis , 1998, Bioinform..

[23]  J. Wójcik,et al.  New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. , 1999, Journal of molecular biology.

[24]  T. Alwyn Jones,et al.  CASP3 comparative modeling evaluation , 1999, Proteins.

[25]  W Li,et al.  Exploring the conformational diversity of loops on conserved frameworks. , 1999, Protein engineering.

[26]  U. Lessel,et al.  Importance of anchor group positioning in protein loop prediction , 1999, Proteins.

[27]  J Moult,et al.  Predicting protein three-dimensional structure. , 1999, Current opinion in biotechnology.

[28]  C. Deane,et al.  A novel exhaustive search algorithm for predicting the conformation of polypeptide segments in proteins , 2000, Proteins.

[29]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[30]  B Qian,et al.  Distribution of indel lengths , 2001, Proteins.