Improving the Performance of Dynamical Simulations Via Multiple Right-Hand Sides

This paper presents an algorithmic approach for improving the performance of many types of stochastic dynamical simulations. The approach is to redesign existing algorithms that use sparse matrix-vector products (SPMV) with single vectors to instead use a more efficient kernel, the generalized SPMV (GSPMV), which computes with multiple vectors simultaneously. In this paper, we show how to redesign a dynamical simulation to exploit GSPMV in way that is not initially obvious because only one vector is available at a time. We study the performance of GSPMV as a function of the number of vectors, and demonstrate the use of GSPMV in the Stokesian dynamics method for the simulation of the motion of macromolecules in the cell. Specifically, for our application, we find that with modern multicore Intel microprocessors in clusters of up to 64 nodes, we can typically multiply by 8 to 16 vectors in only twice the time required to multiply by a single vector. After redesigning the Stokesian dynamics algorithm to exploit GSPMV, we measure a 30 percent speedup in performance in single-node, data parallel simulations.

[1]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[2]  Gerhard Wellein,et al.  Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[3]  Paul Grassia,et al.  Computer simulations of Brownian motion of complex systems , 1995, Journal of Fluid Mechanics.

[4]  John F. Brady,et al.  Accelerated Stokesian dynamics: Brownian motion , 2003 .

[5]  Samuel Williams,et al.  Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[6]  Erik Karl Guckel Large scale simulations of particulate systems using the PME method , 1999 .

[7]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  Nectarios Koziris,et al.  Performance evaluation of the sparse matrix-vector multiplication on modern architectures , 2009, The Journal of Supercomputing.

[9]  John R. Gilbert,et al.  Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[10]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[11]  David E. Keyes,et al.  Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .

[12]  D. Ermak,et al.  Brownian dynamics with hydrodynamic interactions , 1978 .

[13]  Louis J. Durlofsky,et al.  Dynamic simulation of hydrodynamically interacting particles , 1987, Journal of Fluid Mechanics.

[14]  Jeffrey Skolnick,et al.  Crowding and hydrodynamic interactions likely dominate in vivo macromolecular motion , 2010, Proceedings of the National Academy of Sciences.

[15]  Eric de Sturler,et al.  Recycling Krylov Subspaces for Sequences of Linear Systems , 2006, SIAM J. Sci. Comput..

[16]  James Demmel,et al.  When cache blocking of sparse matrix vector multiply works and why , 2007, Applicable Algebra in Engineering, Communication and Computing.

[17]  S. Prager,et al.  Variational Treatment of Hydrodynamic Interaction in Polymers , 1969 .

[18]  John F. Brady,et al.  Dynamic simulation of sheared suspensions. I. General method , 1984 .

[19]  Marcin Dabrowski,et al.  Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs , 2010, Parallel Comput..

[20]  Eun Im,et al.  Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .

[21]  Hiromi Yamakawa,et al.  Transport Properties of Polymer Chains in Dilute Solution: Hydrodynamic Interaction , 1970 .

[22]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[23]  Eligiusz Wajnryb,et al.  Lubrication corrections for three-particle contribution to short-time self-diffusion coefficients in colloidal dispersions , 1999 .

[24]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[25]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[26]  D. O’Leary The block conjugate gradient algorithm and related methods , 1980 .

[27]  Sangtae Kim,et al.  Microhydrodynamics: Principles and Selected Applications , 1991 .

[28]  D. J. O H Accelerated Stokesian Dynamics simulations , 2022 .

[29]  David J. Jeffrey,et al.  Calculation of the resistance and mobility functions for two unequal rigid spheres in low-Reynolds-number flow , 1984, Journal of Fluid Mechanics.

[30]  James Demmel,et al.  Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[31]  John F. Brady,et al.  STOKESIAN DYNAMICS , 2006 .

[32]  M. Fixman,et al.  Simulation of polymer dynamics. I. General theory , 1978 .

[33]  A. Pinar,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[34]  Eric F Darve,et al.  A smooth particle-mesh Ewald algorithm for Stokes suspension simulations: The sedimentation of fibers , 2005 .

[35]  M. Fixman Construction of Langevin forces in the simulation of hydrodynamic interaction , 1986 .