Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations

Biomolecular simulations have traditionally benefited from increases in the processor clock speed and coarse-grain inter-node parallelism on large-scale clusters. With stagnating clock frequencies, the evolutionary path for performance of microprocessors is maintained by virtue of core multiplication. Graphical processing units (GPUs) offer revolutionary performance potential at the cost of increased programming complexity. Furthermore, it has been extremely challenging to effectively utilize heterogeneous resources (host processor and GPU cores) for scientific simulations, as underlying systems, programming models and tools are continually evolving. In this paper, we present a parametric study demonstrating approaches to exploit resources of heterogeneous systems to reduce time-to-solution of a production-level application for biological simulations. By overlapping and pipelining computation and communication, we observe up to 10-fold application acceleration in multi-core and multi-GPU environments illustrating significant performance improvements over code acceleration approaches, where the host-to-accelerator ratio is static, and is constrained by a given algorithmic implementation.

[1]  Sadaf R. Alam,et al.  Biomolecular simulations on petascale: promises and challenges , 2006 .

[2]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[3]  Klaus Schulten,et al.  Multilevel summation of electrostatic potentials using graphics processing units , 2009, Parallel Comput..

[4]  Hong Jiang,et al.  Performance and cost effectiveness of a cluster of workstations and MD-GRAPE 2 for MD simulations , 2003, Second International Symposium on Parallel and Distributed Computing, 2003. Proceedings..

[5]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[6]  Satoshi Matsuoka,et al.  Fast Conjugate Gradients with Multiple GPUs , 2009, ICCS.

[7]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[8]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[9]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[10]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[11]  Edmond Chow,et al.  Desmond Performance on a Cluster of Multicore Processors , 2008 .

[12]  Shinnosuke Obi,et al.  Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence , 2009, Comput. Phys. Commun..

[13]  Sadaf R. Alam,et al.  Cray XT4: an early evaluation for petascale scientific simulation , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[14]  Sadaf R. Alam,et al.  Towards microsecond biological molecular dynamics simulations on hybrid processors , 2010, 2010 International Conference on High Performance Computing & Simulation.

[15]  T. Darden,et al.  Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems , 1993 .

[16]  John A. Gunnels,et al.  Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[17]  Sanjay J. Patel,et al.  Implicitly Parallel Programming Models for Thousand-Core Microprocessors , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[18]  Philippas Tsigas,et al.  A Practical Quicksort Algorithm for Graphics Processors , 2008, ESA.

[19]  David R. Kaeli,et al.  Multi GPU implementation of iterative tomographic reconstruction algorithms , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[20]  Charles Archer,et al.  Breaking the petaflops barrier , 2009, IBM J. Res. Dev..

[21]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[22]  Martin C. Herbordt,et al.  Computing Models for FPGA-Based Accelerators , 2008, Computing in Science & Engineering.

[23]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[24]  Sadaf R. Alam,et al.  Using FPGA Devices to Accelerate Biomolecular Simulations , 2007, Computer.

[25]  M J Harvey,et al.  An Implementation of the Smooth Particle Mesh Ewald Method on GPU Hardware. , 2009, Journal of chemical theory and computation.

[26]  Ibm Blue,et al.  Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..

[27]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.

[28]  John L. Klepeis,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.