Fast analysis of molecular dynamics trajectories with graphics processing units - Radial distribution function histogramming

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.

[1]  Stamatis Vassiliadis,et al.  SIMD Vectorization of Histogram Functions , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[2]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[3]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 2. Direct Self-Consistent-Field Implementation. , 2009, Journal of chemical theory and computation.

[4]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[5]  M J Harvey,et al.  An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware. , 2009, Journal of chemical theory and computation.

[6]  Priya Vashishta,et al.  Large-scale molecular dynamics simulations of alkanethiol self-assembled monolayers. , 2004, The Journal of chemical physics.

[7]  Koji Yasuda,et al.  Accelerating Density Functional Calculations with Graphics Processing Unit. , 2008, Journal of chemical theory and computation.

[8]  Alán Aspuru-Guzik,et al.  Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units , 2010, Computing in Science & Engineering.

[9]  Todd J. Martinez,et al.  Graphical Processing Units for Quantum Chemistry , 2008, Computing in Science & Engineering.

[10]  Peter Schröder,et al.  Quantum Monte Carlo on graphical processing units , 2007, Comput. Phys. Commun..

[11]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[12]  Marcelo Cintra,et al.  SuperCoP: a general, correct, and performance-efficient supervised memory system , 2012, CF '12.

[13]  Eric Darve,et al.  N-Body Simulations on GPUs , 2007, ArXiv.

[14]  Eric Darve,et al.  N-Body simulation on GPUs , 2006, SC.

[15]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[16]  T. Germann,et al.  Microscopic View of Structural Phase Transitions Induced by Shock Waves , 2002, Science.

[17]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[18]  Tse-Yun Feng Proceedings of the 24th International Conference on Parallel Processing , 1995 .

[19]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[20]  Michael Lang,et al.  Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Klaus Schulten,et al.  Adapting a message-driven parallel application to GPU-accelerated clusters , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[23]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[24]  L. Nilsson,et al.  Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K , 2001 .

[25]  Martin C. Herbordt,et al.  GPU acceleration of a production molecular docking code , 2009, GPGPU-2.

[26]  David Kaeli,et al.  Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units , 2009 .

[27]  Makoto Taiji,et al.  Current performance gains from utilizing the GPU or the ASIC MDGRAPE‐3 within an enhanced Poisson Boltzmann approach , 2009, J. Comput. Chem..

[28]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[29]  Michela Taufer,et al.  Molecular dynamics simulations of aqueous ions at the liquid–vapor interface accelerated using graphics processors , 2011, J. Comput. Chem..

[30]  Peter A. Kollman,et al.  Ion solvation in polarizable water: molecular dynamics simulations , 1991 .

[31]  Simon McIntosh-Smith,et al.  Massively Multicore Parallelization of Kohn-Sham Theory. , 2008, Journal of chemical theory and computation.

[32]  Alán Aspuru-Guzik,et al.  Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units , 2010, Computing in Science & Engineering.

[33]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[34]  Rodney A. Kennedy,et al.  Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices , 2007 .

[35]  Christopher J. Hughes,et al.  Atomic Vector Operations on Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[36]  Yihan Shao,et al.  Accelerating resolution-of-the-identity second-order Møller-Plesset quantum chemistry calculations with graphical processing units. , 2008, The journal of physical chemistry. A.

[37]  William J. Dally,et al.  Scatter-add in data parallel architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.

[38]  Klaus Schulten,et al.  GPU acceleration of cutoff pair potentials for molecular modeling applications , 2008, CF '08.

[39]  P. Strevens Iii , 1985 .

[40]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[41]  M J Harvey,et al.  ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. , 2009, Journal of chemical theory and computation.

[42]  P. Kusalik,et al.  Structure in liquid water: A study of spatial distribution functions , 1993 .

[43]  P. Crozier,et al.  Electron–ion coupling effects on simulations of radiation damage in pyrochlore waste forms , 2010, Journal of physics. Condensed matter : an Institute of Physics journal.

[44]  KimChangkyu,et al.  Atomic Vector Operations on Chip Multiprocessors , 2008 .

[45]  Keith E. Gubbins,et al.  Theory of molecular fluids , 1984 .

[46]  Klaus Schulten,et al.  High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs , 2009, GPGPU-2.

[47]  E. M.,et al.  Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[48]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[49]  Liu Peng,et al.  Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-core Platforms , 2008, Euro-Par.

[50]  F. Ron Bailey Proceedings of the 1989 ACM/IEEE conference on Supercomputing , 1989 .

[51]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[52]  Vijay S Pande,et al.  CCMA: A Robust, Parallelizable Constraint Method for Molecular Simulations. , 2010, Journal of chemical theory and computation.

[53]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[54]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[55]  Brett M. Bode,et al.  Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units. , 2010, Journal of chemical theory and computation.