Towards microsecond biological molecular dynamics simulations on hybrid processors

Biomolecular simulations continue to become an increasingly important component of molecular biochemistry and biophysics investigations. Performance improvements in the simulations based on molecular dynamics (MD) codes are widely desired. This is particularly driven by the rapid growth of biological data due to improvements in experimental techniques. Unfortunately, the factors, which allowed past performance improvements of MD simulations, particularly the increase in microprocessor clock frequencies, are no longer improving. Hence, novel software and hardware solutions are being explored for accelerating the performance of popular MD codes. In this paper, we describe our efforts to port and optimize LAMMPS, a popular MD framework, on hybrid processors: graphical processing units (GPUs) accelerated multi-core processors. Our implementation is based on porting the computationally expensive, non-bonded interaction terms on the GPUs, and overlapping the computation on the CPU and GPUs. This functionality is built on top of message passing interface (MPI) that allows multi-level parallelism to be extracted even at the workstation level with the multi-core CPUs as well as extend the implementation on GPU clusters. The results from a number of typically sized biomolecular systems are provided and analysis is performed on 3 generations of GPUs from NVIDIA. Our implementation allows up to 30–40 ns/day throughput on a single workstation as well as significant speedup over Cray XT5, a high-end supercomputing platform. Moreover, detailed analysis of the implementation indicates that further code optimization and improvements in GPUs will allow ∼100 ns/day throughput on workstations and inexpensive GPU clusters, putting the widely-desired microsecond simulation time-scale within reach to a large user community.

[1]  Hong Jiang,et al.  Performance and cost effectiveness of a cluster of workstations and MD-GRAPE 2 for MD simulations , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..

[2]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[3]  Eunjung Cho,et al.  An FPGA Design to Achieve Fast and Accurate Results for Molecular Dynamics Simulations , 2007, ISPA.

[4]  P. Agarwal Enzymes: An integrated view of structure, dynamics and function , 2006, Microbial cell factories.

[5]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[6]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[7]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[8]  Martin C. Herbordt,et al.  Computing Models for FPGA-Based Accelerators , 2008, Computing in Science & Engineering.

[9]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[10]  Yong Dou,et al.  FPGA-Accelerated Molecular Dynamics Simulations: An Overview , 2007, ARC.

[11]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[12]  Thomas A. Darden,et al.  Adventures in Improving the Scaling and Accuracy of a Parallel Molecular Dynamics Program , 1997, The Journal of Supercomputing.

[13]  Michela Taufer,et al.  Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors , 2009, BICoB.

[14]  John L. Klepeis,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.

[15]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[16]  M J Harvey,et al.  An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware. , 2009, Journal of chemical theory and computation.

[17]  Sadaf R. Alam,et al.  Using FPGA Devices to Accelerate Biomolecular Simulations , 2007, Computer.

[18]  Klaus Schulten,et al.  Multilevel summation of electrostatic potentials using graphics processing units , 2009, Parallel Comput..

[19]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.