Building and assessing atomic models of proteins from structural templates: Learning and benchmarks

One approach to predict a protein fold from a sequence (a target) is based on structures of related proteins that are used as templates. We present an algorithm that examines a set of candidates for templates, builds from each of the templates an atomically detailed model, and ranks the models. The algorithm performs a hierarchical selection of the best model using a diverse set of signals. After a quick and suboptimal screening of template candidates from the protein data bank, the current method fine‐tunes the selection to a few models. More detailed signals test the compatibility of the sequence and the proposed structures, and are merged to give a global fitness measure using linear programming. This algorithm is a component of the prediction server LOOPP (http://www.loopp.org). Large‐scale training and tests sets were designed and are presented. Recent results of the LOOPP server in CASP8 are discussed. Proteins 2009. © 2009 Wiley‐Liss, Inc.

[1]  Ron Elber,et al.  SSALN: An alignment algorithm using structure‐dependent substitution matrices and gap penalties learned from structurally aligned protein pairs , 2005, Proteins.

[2]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[3]  Ron Elber,et al.  A template‐finding algorithm and a comprehensive benchmark for homology modeling of proteins , 2008, Proteins.

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  Jian Qiu,et al.  Atomically detailed potentials to recognize native and approximate protein structures , 2005, Proteins.

[6]  Aleksey A. Porollo,et al.  Combining prediction of secondary structure and solvent accessibility in proteins , 2005, Proteins.

[7]  R. Elber,et al.  Protein Recognition by Sequence‐to‐Structure Fitness: Bridging Efficiency and Capacity of Threading Models , 2002 .

[8]  Ron Elber,et al.  Large-scale linear programming techniques for the design of protein folding potentials , 2004, Math. Program..

[9]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[10]  R. Elber,et al.  Distance‐dependent, pair potential for protein folding: Results from linear optimization , 2000, Proteins.

[11]  J Meller,et al.  Linear programming optimization and a double statistical filter for protein threading protocols , 2001, Proteins.

[12]  Ron Elber,et al.  Maximum feasibility guideline in the design and analysis of protein folding potentials , 2002, J. Comput. Chem..

[13]  S. Karlin,et al.  Applications and statistics for multiple high-scoring segments in molecular sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Ceslovas Venclovas,et al.  Progress over the first decade of CASP experiments , 2005, Proteins.

[15]  Richard Hughey,et al.  Calibrating E-values for hidden Markov models using reverse-sequence null models , 2005, Bioinform..

[16]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[17]  Ron Elber,et al.  A coarse‐grained potential for fold recognition and molecular dynamics simulations of proteins , 2009, Proteins.

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  John Moult,et al.  A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. , 2005, Current opinion in structural biology.

[20]  R. Adamczak,et al.  On the transferability of folding and threading potentials and sequence-independent filters for protein folding simulations , 2004 .

[21]  Thorsten Joachims,et al.  Support Vector Training of Protein Alignment Models , 2007, RECOMB.

[22]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using Modeller , 2006, Current protocols in bioinformatics.

[23]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[24]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.