In situ data analytics and indexing of protein trajectories

The transition toward exascale computing will be accompanied by a performance dichotomy. Computational peak performance will rapidly increase; I/O performance will either grow slowly or be completely stagnant. Essentially, the rate at which data are generated will grow much faster than the rate at which data can be read from and written to the disk. MD simulations will soon face the I/O problem of efficiently writing to and reading from disk on the next generation of supercomputers. This article targets MD simulations at the exascale and proposes a novel technique for in situ data analysis and indexing of MD trajectories. Our technique maps individual trajectories' substructures (i.e., α‐helices, β‐strands) to metadata frame by frame. The metadata captures the conformational properties of the substructures. The ensemble of metadata can be used for automatic, strategic analysis within a trajectory or across trajectories, without manually identify those portions of trajectories in which critical changes take place. We demonstrate our technique's effectiveness by applying it to 26.3k helices and 31.2k strands from 9917 PDB proteins and by providing three empirical case studies. © 2017 Wiley Periodicals, Inc.

[1]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[2]  Marianne Winslett,et al.  A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[3]  Erik Riedel,et al.  Proceedings of the 6th USENIX Conference on File and Storage Technologies , 2008 .

[4]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[5]  A. Liwo,et al.  Ab initio simulations of protein-folding pathways by molecular dynamics with the united-residue model of polypeptide chains. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Liwo,et al.  Molecular dynamics with the united-residue model of polypeptide chains. II. Langevin and Berendsen-bath dynamics and tests on model alpha-helical systems. , 2005, The journal of physical chemistry. B.

[7]  D. J. Price,et al.  Assessing scoring functions for protein-ligand interactions. , 2004, Journal of medicinal chemistry.

[8]  Adam Liwo,et al.  Coarse-grained force field: general folding theory. , 2011, Physical chemistry chemical physics : PCCP.

[9]  D. Baker,et al.  Control over overall shape and size in de novo designed proteins , 2015, Proceedings of the National Academy of Sciences.

[10]  Trilce Estrada,et al.  A scalable and accurate method for classifying protein-ligand binding geometries using a MapReduce approach , 2012, Comput. Biol. Medicine.

[11]  Arvind Ramanathan,et al.  On-the-Fly Identification of Conformational Substates from Molecular Dynamics Simulations. , 2011, Journal of chemical theory and computation.

[12]  A. Liwo,et al.  Modification and optimization of the united-residue (UNRES) potential energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. , 2007, The journal of physical chemistry. B.

[13]  Dmitrij Frishman,et al.  STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins , 2004, Nucleic Acids Res..

[14]  Giacomo Fiorin,et al.  Using collective variables to drive molecular dynamics simulations , 2013 .

[15]  I. Shimada,et al.  Three-dimensional solution structure of the B domain of staphylococcal protein A: comparisons of the solution and crystal structures. , 1992, Biochemistry.

[16]  Terence Tao,et al.  Poincare's Legacies, Part II: pages from year two of a mathematical blog , 2009 .

[17]  John L. Klepeis,et al.  A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Hans-Christian Hege,et al.  Visualizing and identifying conformational ensembles in molecular dynamics trajectories , 2002, Comput. Sci. Eng..

[19]  Massimiliano Bonomi,et al.  PLUMED: A portable plugin for free-energy calculations with molecular dynamics , 2009, Comput. Phys. Commun..

[20]  Conrad C. Huang,et al.  UCSF Chimera—A visualization system for exploratory research and analysis , 2004, J. Comput. Chem..

[21]  Shawn D. Newsam,et al.  Validating clustering of molecular dynamics simulations using polymer models , 2011, BMC Bioinformatics.

[22]  M. Macias,et al.  Structural analysis of WW domains and design of a WW prototype , 2000, Nature Structural Biology.

[23]  Trilce Estrada,et al.  Evaluation of Several Two-Step Scoring Functions Based on Linear Interaction Energy, Effective Ligand Size, and Empirical Pair Potentials for Prediction of Protein-Ligand Binding Geometry and Free Energy , 2011, J. Chem. Inf. Model..

[24]  Charles L. Brooks,et al.  New analytic approximation to the standard molecular volume definition and its application to generalized Born calculations , 2003, J. Comput. Chem..

[25]  Adam Liwo,et al.  Microscopic Physics-Based Models of Proteins and Nucleic Acids , 2017 .