Adapting a message-driven parallel application to GPU-accelerated clusters

Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster.

[1]  Peter L. Freddolino,et al.  Molecular dynamics simulations of the complete satellite tobacco mosaic virus. , 2006, Structure.

[2]  Robert Strzodka,et al.  Exploring weak scalability for FEM calculations on a GPU-enhanced cluster , 2007, Parallel Comput..

[3]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[4]  John A. Gunnels,et al.  Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[5]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6]  Joshua A. Anderson,et al.  General purpose molecular dynamics simulations fully implemented on graphics processing units , 2008, J. Comput. Phys..

[7]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[9]  Laxmikant V. Kalé,et al.  NAMD: a Parallel, Object-Oriented Molecular Dynamics Program , 1996, Int. J. High Perform. Comput. Appl..

[10]  Kevin Skadron,et al.  A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors , 2007, GH '07.

[11]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[12]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[13]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[14]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[15]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[16]  Laxmikant V. Kale,et al.  MDScope - a visual computing environment for structural biology , 1995 .

[17]  Scott Pakin,et al.  Fast messages: efficient, portable communication for workstation clusters and MPPs , 1997, IEEE Concurrency.

[18]  Wei Huang,et al.  Design of High Performance MVAPICH2: MPI2 over InfiniBand , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[19]  Klaus Schulten,et al.  GPU acceleration of cutoff pair potentials for molecular modeling applications , 2008, CF '08.

[20]  Dhabaleswar K. Panda,et al.  Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[21]  M. McCool Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform , 2006 .

[22]  SugermanJeremy,et al.  Brook for GPUs , 2004 .

[23]  Peter L. Freddolino,et al.  Ten-microsecond molecular dynamics simulation of a fast-folding WW domain. , 2008, Biophysical journal.

[24]  Eric Darve,et al.  N-Body Simulations on GPUs , 2007, ArXiv.

[25]  Laxmikant V. Kalé,et al.  NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[26]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[27]  Pedro Trancoso,et al.  Initial Experiences Porting a Bioinformatics Application to a Graphics Processor , 2005, Panhellenic Conference on Informatics.

[28]  Laxmikant V. Kale,et al.  Programming Petascale Applications with Charm , 2007 .

[29]  Laxmikant V. Kalé,et al.  Scaling applications to massively parallel machines using Projections performance analysis tool , 2006, Future Gener. Comput. Syst..

[30]  Ivan S Ufimtsev,et al.  Quantum Chemistry on Graphical Processing Units. 1. Strategies for Two-Electron Integral Evaluation. , 2008, Journal of chemical theory and computation.

[31]  Laxmikant V. Kalé,et al.  Performance and modularity benefits of message-driven execution , 2004, J. Parallel Distributed Comput..

[32]  Bryan Chan,et al.  Shader algebra , 2004, SIGGRAPH 2004.