When cache blocking of sparse matrix vector multiply works and why

We present new performance models and more compact data structures for cache blocking when applied to sparse matrix-vector multiply (SpM × V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimum block sizes. In addition, we determine criteria that predict when cache blocking improves performance. We conclude with architectural suggestions that would make memory systems execute SpM × V faster.

[1]  Miriam Potocky-Tripodi,et al.  The Role of Social Capital in Immigrant and Refugee Economic Adaptation , 2004 .

[2]  Laura Carrington,et al.  Modeling application performance by convolving machine signatures with application profiles , 2001 .

[3]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[4]  L Jensen,et al.  Secondary Earner Strategies and Family Poverty: Immigrant-native Differentials, 1960–1980 1 , 1991, The International migration review.

[5]  L. Jensen,et al.  Underemployment across immigrant generations , 2007 .

[6]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[7]  Eun Im,et al.  Optimizing the Performance of Sparse Matrix-Vector Multiplication , 2000 .

[8]  Katherine Yelick,et al.  Performance Modeling and Analysis of Cache Blocking in Sparse Matrix Vector Multiply , 2004 .

[9]  P. Mannucci,et al.  Abstract , 2003 .

[10]  David E. Keyes,et al.  Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .

[11]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[12]  James Demmel,et al.  Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[13]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[14]  M. Naseh,et al.  Best Practices for Social Work with Refugees and Immigrants , 2002 .

[15]  Olivier Temam,et al.  Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.

[16]  Francisco F. Rivera,et al.  Modeling and Improving Locality for Irregular Problems: Sparse Matrix-Vector Product on Cache Memories as a Cache Study , 1999, HPCN Europe.

[17]  Emilio L. Zapata,et al.  Memory Hierarchy Performance Prediction for Blocked Sparse Algorithms , 1999, Parallel Process. Lett..

[18]  Barry R. Chiswick,et al.  Immigrant earnings: Language skills, linguistic concentrations and the business cycle , 2002 .