Modeling Conformation of Protein Loops by Bayesian Network

Modeling protein loops is important for understanding characteristics and functions for protein, but remains an unsolved problem of computational biology. By employing a general Bayesian network, this paper constructs a fully probabilistic continuous model of protein loops, refered to as LoopBN. Direct affection between amino acids and backbone torsion angles can be learned under the framework of LoopBN. The continuous torsion angle pair of the loops can be captured by bivariate von Mises distribution. Empirical tests are conducted to evaluate the performance of LoopBN based on 8 free modeling targets of CASP8. Experimental results show that LoopBN not only performs better than the state-of-the-art modeling method on the quality of loop sample set, but also helps de novo prediction of protein structure by providing better sample set for loop refinement.

[1]  D. Baker,et al.  Modeling structurally variable regions in homologous proteins with rosetta , 2004, Proteins.

[2]  Jesper Ferkinghoff-Borg,et al.  A generative, probabilistic model of local protein structure , 2008, Proceedings of the National Academy of Sciences.

[3]  A. Lander The individuality of stem cells , 2011, BMC Biology.

[4]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  D. Heider,et al.  DNA watermarks: A proof of concept , 2008, BMC Molecular Biology.

[7]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[8]  P. Bradley,et al.  Toward High-Resolution de Novo Structure Prediction for Small Proteins , 2005, Science.

[9]  Adrian A Canutescu,et al.  Cyclic coordinate descent: A robotics algorithm for protein loop closure , 2003, Protein science : a publication of the Protein Society.

[10]  Thomas Stützle,et al.  Ant Colony Optimization and Swarm Intelligence , 2008 .

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  J. Skolnick,et al.  Automated structure prediction of weakly homologous proteins on a genomic scale. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[14]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[15]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[16]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[17]  Lode Wyns,et al.  SABmark- a benchmark for sequence alignment that covers the entire known fold space , 2005, Bioinform..

[18]  J. Skolnick,et al.  Ab initio modeling of small proteins by iterative TASSER simulations , 2007, BMC Biology.

[19]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[20]  Haiyan Jiang,et al.  Ab initio construction of all-atom loop conformations , 2006, Journal of molecular modeling.

[21]  Randy J Read,et al.  Assessment of CASP7 predictions in the high accuracy template‐based modeling category , 2007, Proteins.

[22]  Qiang Lv,et al.  A Parallel ACO Approach Based on One Pheromone Matrix , 2006, ANTS Workshop.

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[25]  K. Mardia,et al.  Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data , 2007, Biometrics.

[26]  Thierry Siméon,et al.  Geometric algorithms for the conformational analysis of long protein loops , 2004, J. Comput. Chem..

[27]  Dietmar Schomburg,et al.  Efficient methods for filtering and ranking fragments for the prediction of structurally variable regions in proteins , 2004, Proteins.