Overhead-Conscious Format Selection for SpMV-Based Applications

Sparse matrix vector multiplication (SpMV) is an important kernel in many applications and is often the major performance bottleneck. The storage format of sparse matrices critically affects the performance of SpMV. Although there have been previous studies on selecting the appropriate format for a given matrix, they have ignored the influence of runtime prediction overhead and format conversion overhead. For many common uses of SpMV, such overhead is part of the execution times and may outweigh the benefits of new formats. Ignoring them makes the predictions from previous solutions frequently suboptimal and sometimes inferior. On the other hand, the overhead is difficult to consider, as it, along with the benefits of having a new format, varies from matrix to matrix, and from application to application. This work proposes a solution. It first explores the pros and cons of various possible treatments to the overhead in the format selection problem. It then presents an explicit approach which involves several regression models for capturing the influence of the overhead and benefits of format conversions on the overall program performance. It proposes a two-stage lazy-and-light scheme to help control the risks in the format predictions and at the same time maximize the overall format conversion benefits. Experiments show that the technique outperforms previous techniques significantly. It improves the overall performance of applications by 1.14X to 1.43X, significantly larger than the 0.82X to 1.24X upperbound speedups overhead-oblivious methods could give.

[1]  Pavel Tvrdík,et al.  Evaluation Criteria for Sparse Matrix Storage Formats , 2016, IEEE Transactions on Parallel and Distributed Systems.

[2]  Samuel N. Kamin,et al.  Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication , 2016, ACM Trans. Archit. Code Optim..

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Feng Shi,et al.  Sparse Matrix Format Selection with Multiclass SVM for SpMV on GPU , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[5]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[8]  Nectarios Koziris,et al.  CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  D. Randall,et al.  A cloud resolving model as a cloud parameterization in the NCAR Community Climate System Model: Preliminary results , 2001 .

[11]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[12]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[13]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[14]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[15]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[16]  Michael Garland,et al.  Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format , 2016, PPoPP.

[17]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[18]  Brian Vinter,et al.  CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.

[19]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[20]  Vlad Sandulescu,et al.  Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016 , 2016, ArXiv.

[21]  Spyros Makridakis,et al.  ARMA Models and the Box–Jenkins Methodology , 1997 .

[22]  Victor Eijkhout,et al.  A Proposed Standard for Numerical Metadata , 2003 .

[23]  Srinivasan Parthasarathy,et al.  Automatic Selection of Sparse Matrix Representation on GPUs , 2015, ICS.

[24]  Maksims Volkovs,et al.  Content-based Neighbor Models for Cold Start in Recommender Systems , 2017, RecSys 2017.

[25]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[26]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[27]  Sanjukta Bhowmick,et al.  Towards Low-Cost, High-Accuracy Classifiers for Linear Solver Selection , 2009, ICCS.

[28]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[29]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[30]  R.D. Falgout,et al.  An Introduction to Algebraic Multigrid Computing , 2006, Computing in Science & Engineering.

[31]  Ninghui Sun,et al.  SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.

[32]  Albert Cohen,et al.  Predictive modeling in a polyhedral optimization space , 2011, CGO 2011.

[33]  Kurt Keutzer,et al.  clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.

[34]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[35]  Michael F. P. O'Boyle,et al.  MILEPOST GCC: machine learning based research compiler , 2008 .

[36]  Xipeng Shen,et al.  An input-centric paradigm for program dynamic optimizations , 2010, OOPSLA.