Strong scaling of general-purpose molecular dynamics simulations on GPUs

Abstract We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, 2013). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5 × .

[1]  Miguel Fuentes-Cabrera,et al.  Coexistence of spinodal instability and thermal nucleation in thin-film rupture: insights from molecular levels. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  M J Harvey,et al.  ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale. , 2009, Journal of chemical theory and computation.

[3]  S. Glotzer,et al.  Stability of the double gyroid phase to nanoparticle polydispersity in polymer-tethered nanosphere systems , 2009 .

[4]  K. Binder,et al.  Langevin dynamics simulations of a two-dimensional colloidal crystal under confinement and shear. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Kevin Dowd High Performance Computing , 2015, Communications in Computer and Information Science.

[6]  S. Wereley,et al.  soft matter , 2019, Science.

[7]  Jens Glaser,et al.  Universality of block copolymer melts. , 2014, Physical review letters.

[8]  Tong Liu,et al.  The development of Mellanox/NVIDIA GPUDirect over InfiniBand—a new model for GPU to GPU communications , 2011, Computer Science - Research and Development.

[9]  J. I. MILLAN , 2013, The Veterinary Record.

[10]  S Torquato,et al.  Probing the limitations of isotropic pair potentials to produce ground-state structural extremes via inverse statistical mechanics. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Arthi Jayaraman,et al.  Decreasing Polymer Flexibility Improves Wetting and Dispersion of Polymer-Grafted Particles in a Chemically Identical Polymer Matrix. , 2014, ACS Macro Letters.

[12]  Sharon C Glotzer,et al.  Thermal and athermal three-dimensional swarms of self-propelled particles. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Daniel Reith,et al.  GPU Based Molecular Dynamics Simulations of Polymer Rings in Concentrated Solution : Structure and Scaling(Statistical Physics and Topology of Polymers with Ramifications to Structure and Function of DNA and Proteins) , 2011 .

[14]  Chi-Hang Lam,et al.  Crossover to surface flow in supercooled unentangled polymer films. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Diwakar Shukla,et al.  OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation. , 2013, Journal of chemical theory and computation.

[16]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[17]  Sharon C. Glotzer,et al.  Pseudo-random number generation for Brownian Dynamics and Dissipative Particle Dynamics simulations on GPU devices , 2011, J. Comput. Phys..

[18]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[19]  M. Klein,et al.  Constant pressure molecular dynamics algorithms , 1994 .

[20]  W Michael Brown,et al.  Rupture mechanism of liquid crystal thin films realized by large-scale molecular simulations. , 2014, Nanoscale.

[21]  Kurt Binder,et al.  Anomalous structure and scaling of ring polymer brushes , 2011, 1104.4943.

[22]  Felix Höfling,et al.  Highly accelerated simulations of glassy dynamics using GPUs: Caveats on limited floating-point precision , 2009, Comput. Phys. Commun..

[23]  Gregory A Voth,et al.  Highly Scalable and Memory Efficient Ultra-Coarse-Grained Molecular Dynamics Simulations. , 2014, Journal of chemical theory and computation.

[24]  David Skinner Performance monitoring of parallel scientific applications , 2005 .

[25]  Alex Travesset,et al.  Folding and stability of helical bundle proteins from coarse‐grained models , 2013, Proteins.

[26]  Michael F Hagan,et al.  Viral genome structures are optimal for capsid assembly , 2013, eLife.

[27]  Steven J. Plimpton,et al.  Implementing molecular dynamics on hybrid high performance computers - Particle-particle particle-mesh , 2012, Comput. Phys. Commun..

[28]  Wataru Shinoda,et al.  Micellization Studied by GPU-Accelerated Coarse-Grained Molecular Dynamics. , 2011, Journal of chemical theory and computation.

[29]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[30]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[31]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[32]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[33]  Mark E. Tuckerman,et al.  Explicit reversible integrators for extended systems dynamics , 1996 .

[35]  George E. Karniadakis,et al.  Accelerating dissipative particle dynamics simulations on GPUs: Algorithms, numerics and applications , 2013, Comput. Phys. Commun..

[36]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[37]  Peng Wang,et al.  Implementing molecular dynamics on hybrid high performance computers - short range forces , 2011, Comput. Phys. Commun..

[38]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[39]  emontmej,et al.  High Performance Computing , 2003, Lecture Notes in Computer Science.

[40]  Dhabaleswar K. Panda,et al.  GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.

[41]  Gary J. Nurt 1993 International Conference on Parallel Processing , 1993 .

[42]  C. Tanford Macromolecules , 1994, Nature.

[43]  Chua-Huang Huang,et al.  2003 International Conference on Parallel Processing Workshops , 2003 .

[44]  Berk Hess,et al.  A flexible algorithm for calculating pair interactions on SIMD architectures , 2013, Comput. Phys. Commun..

[45]  Siegfried Schmauder,et al.  Comput. Mater. Sci. , 1998 .

[46]  Jeffrey K. Hollingsworth Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2017, SC.

[47]  D. Y. Yoon,et al.  An optimized united atom model for simulations of polymethylene melts , 1995 .

[48]  Christian Trott,et al.  LAMMPScuda - a new GPU accelerated Molecular Dynamics Simulations Package and its Application to Ion-Conducting Glasses , 2012 .

[49]  Julien Dorier,et al.  Effects of supercoiling on enhancer–promoter contacts , 2014, Nucleic acids research.

[50]  Michela Taufer,et al.  FENZI: GPU-Enabled Molecular Dynamics Simulations of Large Membrane Regions Based on the CHARMM Force Field and PME , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[51]  Dominic Roehm,et al.  Lattice Boltzmann simulations on GPUs with ESPResSo , 2012 .