Biomolecular committor probability calculation enabled by processing in network storage

Computationally complex and data intensive atomic scale biomolecular simulation is enabled via processing in network storage (PINS): a novel distributed system framework to overcome bandwidth, compute, storage, organizational, and security challenges inherent to the wide-area computation and storage grid. PINS is presented as an effective and scalable scientific simulation framework to meet the unbounded requirements of a 'user of infinite need'. The novel hybrid database-filesystem architecture enables the high throughput computation and data generation required by our scientific target. Biomolecular simulation methods are correlated with the primary PINS components, including: client tools, hybrid database/file management service (GEMS), computation engine (Condor), virtual file system adapter (Parrot), and local file servers (Chirp). Performance for the PINS prototype is reported for the committor probability calculation of a solvated protein domain requiring 500 independent simulations and the generation of over 1,000,000 output files.

[1]  Vijay S Pande,et al.  One-dimensional reaction coordinate and the corresponding potential of mean force from commitment probability distribution. , 2005, The journal of physical chemistry. B.

[2]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[3]  Data Grids , 2009, Encyclopedia of Database Systems.

[4]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[5]  Berend Smit,et al.  Understanding Molecular Simulation , 2001 .

[6]  Andrew E. Torda,et al.  Biomolecular modelling: Overview of types of methods to search and sample conformational space , 2008 .

[7]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[8]  W. G. Hoover molecular dynamics , 1986, Catalysis from A to Z.

[9]  Douglas Thain,et al.  Separating Abstractions from Resources in a Tactical Storage System , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[10]  K. Hukushima,et al.  Exchange Monte Carlo Method and Application to Spin Glass Simulations , 1995, cond-mat/9512035.

[11]  L. Buée,et al.  1H NMR Study on the Binding of Pin1 Trp-Trp Domain with Phosphothreonine Peptides* , 2001, The Journal of Biological Chemistry.

[12]  Douglas Thain,et al.  Generosity and gluttony in GEMS: grid enabled molecular simulations , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[13]  Krzysztof Sliwa,et al.  Functions of WW domains in the nucleus , 2001, FEBS letters.

[14]  David P. Anderson,et al.  SETI@home: an experiment in public-resource computing , 2002, CACM.

[15]  D. Thain,et al.  Applying feedback control to a replica management system , 2006, 2006 Proceeding of the Thirty-Eighth Southeastern Symposium on System Theory.

[16]  R Elber,et al.  Novel methods for molecular dynamics simulations. , 1996, Current opinion in structural biology.

[17]  B. Berne,et al.  Novel methods of sampling phase space in the simulation of biological systems. , 1997, Current opinion in structural biology.

[18]  V. Pande,et al.  On the transition coordinate for protein folding , 1998 .

[19]  Stuart Murdock,et al.  BioSimGrid: towards a worldwide repository for biomolecular simulations. , 2004, Organic & biomolecular chemistry.

[20]  Douglas Thain,et al.  Access control for a replica management database , 2006, StorageSS '06.

[21]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[22]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[23]  U. Hansmann Parallel tempering algorithm for conformational studies of biological molecules , 1997, physics/9710041.

[24]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[25]  Ian T. Foster,et al.  GASS: a data movement and access service for wide area computing systems , 1999, IOPADS '99.

[26]  Douglas Thain,et al.  Parrot: Transparent User-Level Middleware for Data-Intensive Computing , 2005, Scalable Comput. Pract. Exp..

[27]  K. Lu Pinning down cell signaling, cancer and Alzheimer's disease. , 2004, Trends in biochemical sciences.

[28]  A. Leach Molecular Modelling: Principles and Applications , 1996 .

[29]  Vijay S. Pande,et al.  On the role of chemical detail in simulating protein folding kinetics , 2006 .

[30]  Reagan Moore,et al.  Data grids, collections, and grid bricks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[31]  Carl Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, SC.

[32]  Tamar Schlick,et al.  Molecular Modeling and Simulation: An Interdisciplinary Guide , 2010 .

[33]  Michael R. Shirts,et al.  Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. , 2003, Biopolymers.

[34]  Osamu Tatebe,et al.  Gfarm v2: A Grid file system that supports high-performance distributed and parallel data computing , 2005 .

[35]  David Chandler,et al.  Transition path sampling: throwing ropes over rough mountain passes, in the dark. , 2002, Annual review of physical chemistry.

[36]  V. Pande,et al.  Pathways for protein folding: is a new view needed? , 1998, Current opinion in structural biology.

[37]  Carl Kesselman,et al.  Performance and scalability of a replica location service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[38]  Perla B. Balbuena,et al.  Molecular dynamics : from classical to quantum methods , 1999 .

[39]  Ron Elber,et al.  Long-timescale simulation methods. , 2005, Current opinion in structural biology.