A spatio-temporal mining approach towards summarizing and analyzing protein folding trajectories

Understanding the protein folding mechanism remains a grand challenge in structural biology. In the past several years, computational theories in molecular dynamics have been employed to shed light on the folding process. Coupled with high computing power and large scale storage, researchers now can computationally simulate the protein folding process in atomistic details at femtosecond temporal resolution. Such simulation often produces a large number of folding trajectories, each consisting of a series of 3D conformations of the protein under study. As a result, effectively managing and analyzing such trajectories is becoming increasingly important.In this article, we present a spatio-temporal mining approach to analyze protein folding trajectories. It exploits the simplicity of contact maps, while also integrating 3D structural information in the analysis. It characterizes the dynamic folding process by first identifying spatio-temporal association patterns in contact maps, then studying how such patterns evolve along a folding trajectory. We demonstrate that such patterns can be leveraged to summarize folding trajectories, and to facilitate the detection and ordering of important folding events along a folding path. We also show that such patterns can be used to identify a consensus partial folding pathway across multiple folding trajectories. Furthermore, we argue that such patterns can capture both local and global structural topology in a 3D protein conformation, thereby facilitating effective structural comparison amongst conformations.We apply this approach to analyze the folding trajectories of two small synthetic proteins-BBA5 and GSGS (or Beta3S). We show that this approach is promising towards addressing the above issues, namely, folding trajectory summarization, folding events detection and ordering, and consensus partial folding pathway identification across trajectories.

[1]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[2]  Srinivasan Parthasarathy,et al.  A generalized framework for mining spatio-temporal patterns in scientific data , 2005, KDD '05.

[3]  V. Pande,et al.  Absolute comparison of simulated and experimental protein-folding dynamics , 2002, Nature.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  Vijay S. Pande,et al.  Screen Savers of the World Unite! , 2000, Science.

[6]  Werner Dubitzky,et al.  Towards Data Warehousing and Mining of Protein Unfolding Simulation Data , 2005, Journal of Clinical Monitoring and Computing.

[7]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[8]  Vijay S. Pande,et al.  Meeting halfway on the bridge between protein folding theory and experiment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michael R. Shirts,et al.  Native-like mean structure in the unfolded ensemble of small proteins. , 2002, Journal of molecular biology.

[10]  M Vendruscolo,et al.  Efficient dynamics in the space of contact maps. , 1998, Folding & design.

[11]  D. Baker,et al.  Contact order, transition state placement and the refolding rates of single domain proteins. , 1998, Journal of molecular biology.

[12]  Leonidas J. Guibas,et al.  Exploring Protein Folding Trajectories Using Geometric Spanners , 2004, Pacific Symposium on Biocomputing.

[13]  A Caflisch,et al.  Native topology or specific interactions: what is more important for protein folding? , 2001, Journal of molecular biology.

[14]  Srinivasan Parthasarathy,et al.  Mining Spatial Object Associations for Scientific Data , 2005, IJCAI.

[15]  G. Rose,et al.  Is protein folding hierarchic? I. Local structure and peptide folding. , 1999, Trends in biochemical sciences.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Mohammed J. Zaki,et al.  Mining Protein Contact Maps , 2002, BIOKDD.

[18]  Srinivasan Parthasarathy,et al.  Discovering Spatial Relationships Between Approximately Equivalent Patterns in Contact Maps , 2004, BIOKDD.

[19]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[20]  Srinivasan Parthasarathy,et al.  Towards Association Based Spatio-temporal Reasoning , 2005 .

[21]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[22]  A. Caflisch,et al.  Folding simulations of a three-stranded antiparallel β-sheet peptide , 2000 .

[23]  Kevin W Plaxco,et al.  How the folding rate constant of simple, single-domain proteins depends on the number of native contacts , 2002, Proceedings of the National Academy of Sciences of the United States of America.