Modeling protein loops with knowledge-based prediction of sequence-structure alignment

MOTIVATION As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY http://cmb.genomics.sinica.edu.tw

[1]  Baldomero Oliva,et al.  ArchDB: automated protein loop classification as a tool for structural genomics , 2004, Nucleic Acids Res..

[2]  T. Blundell,et al.  Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: A database for modeling and prediction , 1996, Protein science : a publication of the Protein Society.

[3]  Baldomero Oliva,et al.  Prediction of the conformation and geometry of loops in globular proteins: Testing ArchDB, a structural classification of loops , 2005, Proteins.

[4]  An-Suei Yang,et al.  Local Structure Prediction with Local Structure-based Sequence Profiles , 2003, Bioinform..

[5]  T. Blundell,et al.  Predicting the conformational class of short and medium size loops connecting regular secondary structures: application to comparative modelling. , 1997, Journal of molecular biology.

[6]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[7]  An-Suei Yang,et al.  Protein backbone angle prediction with machine learning approaches , 2004, Bioinform..

[8]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance. , 2000, Journal of molecular biology.

[9]  U. Lessel,et al.  Importance of anchor group positioning in protein loop prediction , 1999, Proteins.

[10]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[11]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[12]  A. Goede,et al.  Loops In Proteins (LIP)--a comprehensive loop database for homology modelling. , 2003, Protein engineering.

[13]  Charlotte M. Deane,et al.  Browsing the SLoop database of structurally classified loops connecting elements of protein secondary structure , 2000, Bioinform..

[14]  B Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. II. On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. , 2000, Journal of molecular biology.

[15]  C M Deane,et al.  Improved protein loop prediction from sequence alone. , 2001, Protein engineering.

[16]  References , 1971 .

[17]  Baldomero Oliva,et al.  A supersecondary structure library and search algorithm for modeling loops in protein structures , 2006, Nucleic acids research.

[18]  Dietmar Schomburg,et al.  Efficient methods for filtering and ranking fragments for the prediction of structurally variable regions in proteins , 2004, Proteins.

[19]  J. Wójcik,et al.  New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. , 1999, Journal of molecular biology.

[20]  An-Suei Yang,et al.  Structure-dependent sequence alignment for remotely related proteins , 2002, Bioinform..

[21]  B. Honig,et al.  An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. , 2000, Journal of molecular biology.

[22]  Burkhard Rost,et al.  DSSPcont: continuous secondary structure assignments for proteins , 2003, Nucleic Acids Res..

[23]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[24]  Marc A. Martí-Renom,et al.  MODBASE: a database of annotated comparative protein structure models and associated resources , 2005, Nucleic Acids Res..

[25]  Baldomero Oliva,et al.  An automated classification of the structure of protein loops. , 1997, Journal of molecular biology.

[26]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[27]  U. Lessel,et al.  Creation and characterization of a new, non-redundant fragment data bank. , 1997, Protein engineering.

[28]  An-Suei Yang,et al.  Local structure-based sequence profile database for local and global protein structure predictions , 2002, Bioinform..

[29]  András Fiser,et al.  Saturating representation of loop conformational fragments in structure databanks , 2006, BMC Structural Biology.

[30]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[31]  W Li,et al.  Exploring the conformational diversity of loops on conserved frameworks. , 1999, Protein engineering.