Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
暂无分享,去创建一个
Samuel N. Kamin | María Jesús Garzarán | Baris Aktemur | Furkan Kiraç | Buse Yilmaz | M. Garzarán | Buse Yilmaz | Mustafa Furkan Kıraç | Barış Aktemur
[1] Chung-chieh Shan,et al. Shonan challenge for generative programming: short position paper , 2013, PEPM '13.
[2] Dominik Grewe,et al. Automatically generating and tuning GPU code for sparse matrix-vector multiplication from a high-level representation , 2011, GPGPU-4.
[3] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[4] Ping Guo,et al. A Performance Modeling and Optimization Analysis Tool for Sparse Matrix-Vector Multiplication on GPUs , 2014, IEEE Transactions on Parallel and Distributed Systems.
[5] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[6] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[7] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[8] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[9] John R. Gilbert,et al. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.
[10] Nectarios Koziris,et al. Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[11] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.
[12] Samuel N. Kamin,et al. Jumbo: run-time code generation for Java and its applications , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[13] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.
[14] Andrew Lumsdaine,et al. Accelerating sparse matrix computations via data compression , 2006, ICS '06.
[15] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[16] OlukotunKunle,et al. Optimizing data structures in high-level programs , 2013 .
[17] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[18] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[19] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[20] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[21] Nectarios Koziris,et al. Exploiting compression opportunities to improve SpMxV performance on shared memory systems , 2010, TACO.
[22] Ninghui Sun,et al. SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication , 2013, PLDI.
[23] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[24] Alistair P. Rendell,et al. Runtime sparse matrix format selection , 2010, ICCS.
[25] John M. Mellor-Crummey,et al. Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..
[26] James Demmel,et al. Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..
[27] Ting Wang,et al. Optimizing SpMV for Diagonal Sparse Matrices on GPU , 2011, 2011 International Conference on Parallel Processing.
[28] Prakash S. Raghavendra,et al. Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[29] Keshav Pingali,et al. Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.
[30] Ankit Jain. pOSKI : An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures , 2008 .
[31] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[32] Fred G. Gustavson,et al. Symbolic Generation of an Optimal Crout Algorithm for Sparse Systems of Linear Equations , 1970, JACM.
[33] Pascal Giorgi,et al. Generating Optimized Sparse Matrix Vector Product over Finite Fields , 2014, ICMS.
[34] Alistair P. Rendell,et al. Reinforcement learning for automated performance tuning: Initial evaluation for sparse matrix format selection , 2008, 2008 IEEE International Conference on Cluster Computing.
[35] Nectarios Koziris,et al. CSX: an extended compression format for spmv on shared memory systems , 2011, PPoPP '11.
[36] Jacques Carette,et al. Multi-stage programming with functors and monads: eliminating abstraction overhead from generic code , 2005, GPCE'05.
[37] David E. Keyes,et al. Towards Realistic Performance Bounds for Implicit CFD Codes , 2000 .
[38] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[39] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[40] Katherine Yelick,et al. Autotuning Sparse Matrix-Vector Multiplication for Multicore , 2012 .
[41] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[42] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[43] Yoshinari Fukui,et al. Supercomputing of circuits simulation , 1989, Supercomputing '89.
[44] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[45] Alistair P. Rendell,et al. Generating optimal CUDA sparse matrix–vector product implementations for evolving GPU hardware , 2012, Concurr. Comput. Pract. Exp..
[46] Calvin J. Ribbens,et al. A Library for Pattern-based Sparse Matrix Vector Multiply , 2011, International Journal of Parallel Programming.
[47] Julia L. Lawall,et al. A tour of Tempo: a program specializer for the C language , 2004, Sci. Comput. Program..
[48] Walid A. Abu-Sufah,et al. Auto-tuning of Sparse Matrix-Vector Multiplication on Graphics Processors , 2013, ISC.
[49] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[50] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[51] Peter Lee,et al. Optimizing ML with run-time code generation , 1996, PLDI '96.
[52] Michael Garland,et al. Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[53] Eduardo F. D'Azevedo,et al. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format , 2005, International Conference on Computational Science.