Operation Stacking for Ensemble Computations With Variable Convergence

Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, index-based matrix representations, which degrade cache utilization by imposing irregular memory accesses and increasing the number of overall accesses. Compounding the problem, the small number of floating-point operations in a single sparse iteration leads to low floating-point pipeline utilization. Operation stacking addresses these problems for large ensemble computations that solve multiple systems of linear equations with identical sparsity structure. By combining the data of multiple problems and solving them as one, operation stacking improves locality, reduces cache misses, and increases floating-point pipeline utilization. Operation stacking also requires less memory bandwidth because it involves fewer index array accesses. In this paper we present the Operation Stacking Framework (OSF), an object-oriented framework that provides runtime and code generation support for the development of stacked iterative solvers. OSF’s runtime component provides an iteration engine that supports efficient ejection of converged problems from the stack. It separates the specific solver algorithm from the coding conventions and data representations that are necessary to implement stacking. Stacked solvers created with OSF can be used transparently without requiring significant changes to existing applications. Our results show that stacking can provide speedups up to 1.94× with an average of 1.46×, even in scenarios in which the number of iterations required to converge varies widely within a stack of problems. Our evaluation shows that these improvements correlate with better cache utilization, improved floating-point utilization, and reduced memory accesses.

[1]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[2]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[3]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[4]  John M. Mellor-Crummey,et al.  Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..

[5]  Andrew Lumsdaine,et al.  Accelerating sparse matrix computations via data compression , 2006, ICS '06.

[6]  John N. Shadid,et al.  Official Aztec user''s guide: version 2.1 , 1999 .

[7]  Elizabeth R. Jessup,et al.  On Improving Linear Solver Performance: A Block Variant of GMRES , 2005, SIAM J. Sci. Comput..

[8]  JAMES DEMMEL,et al.  LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.

[9]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[10]  D. O’Leary The block conjugate gradient algorithm and related methods , 1980 .

[11]  Katherine A. Yelick,et al.  Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY , 2001, International Conference on Computational Science.

[12]  C. Ieee IEEE Standard for Information Technology - Portable Operating System Interface (POSIX): System Application Program Interface (API), Amendment 1: Realtime Extension (C Language), IEEE Std 1003.1b-1993 , 1994 .

[13]  Adrian E. Raftery,et al.  Weather Forecasting with Ensemble Methods , 2005, Science.

[14]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[15]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[16]  Wolfgang Wenzel,et al.  Fluctuation analysis and accuracy of a large‐scale in silico screen , 2004, J. Comput. Chem..

[17]  Roldan Pozo,et al.  NIST sparse BLAS user's guide , 2001 .

[18]  Nectarios Koziris,et al.  Optimizing sparse matrix-vector multiplication using index and value compression , 2008, CF '08.

[19]  Calvin J. Ribbens,et al.  An operation stacking framework for large ensemble computations , 2007, ICS '07.

[20]  Corporate Ieee,et al.  Information Technology-Portable Operating System Interface , 1990 .

[21]  D. Sorensen,et al.  A Survey of Model Reduction Methods for Large-Scale Systems , 2000 .

[22]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[23]  Serkan Gugercin,et al.  H2 Model Reduction for Large-Scale Linear Dynamical Systems , 2008, SIAM J. Matrix Anal. Appl..

[24]  Michael T. Heath,et al.  Improving Performance of Sparse Matrix-Vector Multiplication , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[25]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[26]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[27]  A. Lumsdaine,et al.  A Sparse Matrix Library in C + + for High PerformanceArchitectures , 1994 .

[28]  V. Simoncini,et al.  Convergence properties of block GMRES and matrix polynomials , 1996 .

[29]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[30]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[31]  Vipin Kumar,et al.  PSPASES: An Efficient and Scalable Parallel Sparse Direct Solver , 1999, PPSC.

[32]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[33]  V. Simoncini,et al.  A hybrid block GMRES method for nonsymmetric systems with multiple right-hand sides , 1996 .

[34]  Jack Dongarra,et al.  Numerical Linear Algebra for High-Performance Computers , 1998 .

[35]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[36]  Sivan Toledo,et al.  Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..