Adaptive sparse tiling for sparse matrix multiplication
暂无分享,去创建一个
P. Sadayappan | Aravind Sukumaran-Rajam | Israt Nisa | Changwan Hong | Kunal Singh | P. Sadayappan | Israt Nisa | Aravind Sukumaran-Rajam | Changwan Hong | Kunal Singh
[1] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[2] Francisco F. Rivera,et al. Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs , 2012, Microprocess. Microsystems.
[3] Xuemin Lin,et al. Speedup Graph Processing by Graph Ordering , 2016, SIGMOD Conference.
[4] Zhen Jia,et al. CVR: efficient vectorization of SpMV on x86 processors , 2018, CGO.
[5] Hans-Peter Seidel,et al. Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU , 2017, ICS.
[6] Michalis K. Titsias,et al. The Infinite Gamma-Poisson Feature Model , 2007, NIPS.
[7] Pradeep Dubey,et al. Faster CNNs with Direct Sparse Convolutions and Guided Pruning , 2016, ICLR.
[8] Ting Wang,et al. Optimizing SpMV for Diagonal Sparse Matrices on GPU , 2011, 2011 International Conference on Parallel Processing.
[9] Srinivasan Parthasarathy,et al. Efficient sparse-matrix multi-vector product on GPUs , 2018, HPDC.
[10] Leonid Oliker,et al. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations , 2013, SIAM Rev..
[11] John Canny,et al. SAME but Different: Fast and High Quality Gibbs Parameter Estimation , 2014, KDD.
[12] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[13] Samuel Williams,et al. Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[14] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.
[15] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[16] Francisco Vázquez,et al. A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..
[17] Stephen John Turner,et al. Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[18] Yongchao Liu,et al. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[19] John F. Canny,et al. Collaborative filtering with privacy , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.
[20] John D. Owens,et al. Design Principles for Sparse Matrix Multiplication on the GPU , 2018, Euro-Par.
[21] Michael F. P. O'Boyle,et al. A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[22] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[23] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[24] Viktor K. Prasanna,et al. ReCALL: Reordered Cache Aware Locality Based Graph Processing , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).
[25] Hans-Peter Seidel,et al. How naive is naive SpMV on the GPU? , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[26] Michael Garland,et al. Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format , 2016, PPoPP.
[27] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[28] Michael F. P. O'Boyle,et al. Automatic optimization of thread-coarsening for graphics processors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[29] Scott McMillan,et al. Design of the GraphBLAS API for C , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[30] Victor Eijkhout,et al. Performance Optimization and Modeling of Blocked Sparse Kernels , 2007, Int. J. High Perform. Comput. Appl..
[31] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[32] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[33] Wilfred Pinfold,et al. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis , 2009, HiPC 2009.
[34] Rob H. Bisseling,et al. Cache-Oblivious Sparse Matrix--Vector Multiplication by Using Sparse Matrix Partitioning Methods , 2009, SIAM J. Sci. Comput..
[35] C.W. Kessler,et al. The SPARAMAT approach to automatic comprehension of sparse matrix computations , 1999, Proceedings Seventh International Workshop on Program Comprehension.
[36] John Canny,et al. BIDMach: Large-scale Learning with Zero Memory Allocation , 2013 .
[37] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[38] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[39] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[40] Francisco Vázquez,et al. FastSpMM: An Efficient Library for Sparse Matrix Matrix Product on GPUs , 2014, Comput. J..
[41] Joseph L. Greathouse,et al. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[42] Jack J. Dongarra,et al. Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product , 2015, SpringSim.
[43] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[44] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[45] Ümit V. Çatalyürek,et al. Regularizing graph centrality computations , 2015, J. Parallel Distributed Comput..
[46] Daisuke Takahashi,et al. Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS Format on GPUs , 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering.
[47] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.
[48] Elizabeth R. Jessup,et al. On Improving Linear Solver Performance: A Block Variant of GMRES , 2005, SIAM J. Sci. Comput..