A Systematic Survey of General Sparse Matrix-matrix Multiplication
暂无分享,去创建一个
[1] Jaeha Kung,et al. AutoRelax: HW-SW Co-Optimization for Efficient SpGEMM Operations With Automated Relaxation in Deep Learning , 2022, IEEE Transactions on Emerging Topics in Computing.
[2] Zhengyang Lu,et al. TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs , 2022, PPoPP.
[3] John R. Gilbert,et al. Combinatorial BLAS 2.0: Scaling Combinatorial Algorithms on Distributed-Memory Systems , 2021, IEEE Transactions on Parallel and Distributed Systems.
[4] Jia-Min Shieh,et al. CiM3D: Comparator-in-Memory Designs Using Monolithic 3-D Technology for Accelerating Data-Intensive Applications , 2021, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.
[5] Kaustubh Shivdikar,et al. SMASH: Sparse Matrix Atomic Scratchpad Hashing , 2021, ArXiv.
[6] Gagan Agrawal,et al. Scaling Sparse Matrix Multiplication on CPU-GPU Nodes , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[7] Thomas B. Rolinger,et al. Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[8] J. Emer,et al. Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication , 2021, International Conference on Architectural Support for Programming Languages and Operating Systems.
[9] Sivasankaran Rajamanickam,et al. Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels , 2021, ArXiv.
[10] Dipankar Das,et al. Extending Sparse Tensor Accelerators to Support Multiple Compression Formats , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[11] Jiajia Li,et al. Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory , 2021, PPoPP.
[12] Hari Sundar,et al. A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication , 2021, HPC Asia.
[13] Hariram Thirucherai Govindarajan,et al. Monolithic 3D+-IC Based Massively Parallel Compute-in-Memory Macro for Accelerating Database and Machine Learning Primitives , 2020, 2020 IEEE International Electron Devices Meeting (IEDM).
[14] Cevdet Aykanat,et al. Cartesian Partitioning Models for 2D and 3D Parallel SpGEMM Algorithms , 2020, IEEE Transactions on Parallel and Distributed Systems.
[15] T. Katagiri,et al. Performance Evaluation of Accurate Matrix-Matrix Multiplication on GPU Using Sparse Matrix Multiplications , 2020, 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW).
[16] Leonid Oliker,et al. Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[17] Ariful Azad,et al. Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[18] Nitish Srivastava,et al. MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Ariful Azad,et al. Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Orestis Zachariadis,et al. Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores , 2020, Comput. Electr. Eng..
[21] Martin Herbordt,et al. FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid Solvers , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[22] Richard P. Martin,et al. Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra , 2020, ArXiv.
[23] Yongjun Park,et al. Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).
[24] Ariful Azad,et al. Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking , 2020, SPAA.
[25] Ariful Azad,et al. Optimizing High Performance Markov Clustering for Pre-Exascale Architectures , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[26] Martin Winter,et al. spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis , 2020, PPoPP.
[27] Song Han,et al. SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[28] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[29] TIMOTHY A. DAVIS,et al. Algorithm 1000 , 2019, ACM Transactions on Mathematical Software.
[30] Ariful Azad,et al. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors , 2019, Parallel Comput..
[31] Fugang Wang,et al. Generalized Sparse Matrix-Matrix Multiplication for Vector Engines and Graph Applications , 2019, 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC).
[32] Onur Mutlu,et al. SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations , 2019, MICRO.
[33] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[34] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.
[35] Gu-Yeon Wei,et al. MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation , 2019, MICRO.
[36] Jeanine Cook,et al. MetaStrider , 2019, ACM Trans. Archit. Code Optim..
[37] John D. Owens,et al. Accelerating DNN Inference with GraphBLAS and the GPU , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).
[38] Jiaming Xie,et al. SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps , 2019, APPT.
[39] Xiaoyong Du,et al. Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors , 2019, CCF Transactions on High Performance Computing.
[40] Kenli Li,et al. Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer , 2019, IEEE Transactions on Parallel and Distributed Systems.
[41] Guoqing Xiao,et al. Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight , 2019, Neural Computing and Applications.
[42] Hans-Peter Seidel,et al. Adaptive sparse matrix-matrix multiplication on the GPU , 2019, PPoPP.
[43] Cevdet Aykanat,et al. Scaling sparse matrix-matrix multiplication in the accumulo database , 2019, Distributed and Parallel Databases.
[44] Guangming Tan,et al. Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication , 2019, International journal of parallel programming.
[45] Katherine Yelick,et al. BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper , 2018, bioRxiv.
[46] Christopher M. Siefert,et al. Low Thread-Count Gustavson: A Multithreaded Algorithm for Sparse Matrix-Matrix Multiplication Using Perfect Hashing , 2018, 2018 IEEE/ACM 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (scalA).
[47] Timothy A. Davis,et al. Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).
[48] Sivasankaran Rajamanickam,et al. Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).
[49] Uwe Naumann,et al. Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures , 2018, SIAM J. Sci. Comput..
[50] Amanda Bienz,et al. Reducing communication in sparse solvers , 2018 .
[51] Satoshi Matsuoka,et al. High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures , 2018, ICPP Workshops.
[52] Simon D. Hammond,et al. Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments , 2018, ArXiv.
[53] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[54] Mehmet Deveci,et al. Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures , 2018, Parallel Comput..
[55] Kadir Akbudak,et al. Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication , 2018, ACM Trans. Parallel Comput..
[56] P. Sadayappan,et al. Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).
[57] Simon D. Hammond,et al. Fast linear algebra-based triangle counting with KokkosKernels , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[58] William Song,et al. Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[59] Alfio Lazzaro,et al. Porting of the DBCSR Library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi Systems , 2017, PARCO.
[60] Satoshi Matsuoka,et al. High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU , 2017, 2017 46th International Conference on Parallel Processing (ICPP).
[61] Kadir Akbudak,et al. Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures , 2017, IEEE Transactions on Parallel and Distributed Systems.
[62] Weifeng Liu,et al. Fast segmented sort on GPUs , 2017, ICS.
[63] Mehmet Deveci,et al. Performance-Portable Sparse Matrix-Matrix Multiplication for Many-Core Architectures , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[64] Davide Barbieri,et al. Sparse Matrix-Vector Multiplication on GPGPUs , 2017, ACM Trans. Math. Softw..
[65] Vivek Sarkar,et al. A survey of sparse matrix-vector multiplication performance on large matrices , 2016, ArXiv.
[66] Yonggang Wen,et al. Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication , 2016, ICS.
[67] John D. Owens,et al. A Comparative Study on Exact Triangle Counting Algorithms on the GPU , 2016, HPGP@HPDC.
[68] Mehmet Deveci,et al. Parallel Graph Coloring for Manycore Architectures , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[69] Oded Schwartz,et al. Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication , 2016, TOPC.
[70] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[71] V. Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.
[72] Jonathan W. Berry,et al. A task-based linear algebra Building Blocks approach for scalable graph analytics , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[73] Robert Strzodka,et al. AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods , 2015, SIAM J. Sci. Comput..
[74] Luke N. Olson,et al. Optimizing Sparse Matrix—Matrix Multiplication for the GPU , 2015, ACM Trans. Math. Softw..
[75] Samuel Williams,et al. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication , 2015, SIAM J. Sci. Comput..
[76] Pradeep Dubey,et al. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms , 2015, ISC.
[77] Jeremy Kepner,et al. Graphulo implementation of server-side sparse matrix multiply in the Accumulo database , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).
[78] Alessandro Curioni,et al. Semiempirical Molecular Dynamics (SEMD) I: Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm for Matrices with Decay. , 2015, Journal of chemical theory and computation.
[79] Jeremy Kepner,et al. Graphulo: Linear Algebra Graph Kernels for NoSQL Databases , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[80] John R. Gilbert,et al. Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[81] Brian Vinter,et al. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors , 2015, J. Parallel Distributed Comput..
[82] Christoph Lenzen,et al. Algebraic methods in the congested clique , 2015, Distributed Computing.
[83] Jonathan J. Hu,et al. Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid , 2015, SIAM J. Sci. Comput..
[84] Emanuel H. Rubensson,et al. Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model , 2015, Parallel Comput..
[85] Uwe Naumann,et al. GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging , 2015, SIAM J. Sci. Comput..
[86] Huy T. Vo,et al. The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..
[87] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[88] John F. Stanton,et al. A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..
[89] Kadir Akbudak,et al. Simultaneous Input and Output Matrix Partitioning for Outer-Product-Parallel Sparse Matrix-Matrix Multiplication , 2014, SIAM J. Sci. Comput..
[90] Michael Stonebraker,et al. Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[91] Brian Vinter,et al. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[92] Joost VandeVondele,et al. Sparse matrix multiplication: The distributed block-compressed sparse row library , 2014, Parallel Comput..
[93] Rasmus Pagh,et al. The Input/Output Complexity of Sparse Matrix Multiplication , 2014, ESA.
[94] Mehmet Deveci,et al. Hypergraph Sparsification and Its Application to Partitioning , 2013, 2013 42nd International Conference on Parallel Processing.
[95] Kun-Lung Wu,et al. Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..
[96] Gustavo Alonso,et al. Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..
[97] James Demmel,et al. Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.
[98] Kiran Kumar Matam,et al. Sparse matrix-matrix multiplication on modern architectures , 2012, 2012 19th International Conference on High Performance Computing.
[99] Emanuel H. Rubensson,et al. Chunks and Tasks: A programming model for parallelization of dynamic algorithms , 2012, Parallel Comput..
[100] Luke N. Olson,et al. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods , 2012, SIAM J. Sci. Comput..
[101] A. Grimshaw,et al. High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..
[102] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[103] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[104] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[105] Yudong Zhang,et al. A novel algorithm for all pairs shortest path problem based on matrix multiplication and pulse coupled neural network , 2011, Digit. Signal Process..
[106] Kamesh Madduri,et al. Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[107] Christos Faloutsos,et al. Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.
[108] Ngai Wong,et al. Design space exploration for sparse matrix-matrix multiplication on FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.
[109] Anthony K. H. Tung,et al. On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..
[110] Sriram Krishnamoorthy,et al. Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).
[111] Rasmus Resen Amossen,et al. Better Size Estimation for Sparse Matrix Products , 2010, Algorithmica.
[112] John R. Gilbert,et al. Highly Parallel Sparse Matrix-Matrix Multiplication , 2010, ArXiv.
[113] Pradeep Dubey,et al. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..
[114] Joseph M. Hellerstein,et al. MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..
[115] Jonathan Cohen,et al. Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.
[116] David Eppstein,et al. Journal of Graph Algorithms and Applications the H-index of a Graph and Its Application to Dynamic Subgraph Statistics , 2022 .
[117] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.
[118] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[119] John R. Gilbert,et al. A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.
[120] Ralf Lämmel,et al. Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..
[121] Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs , 2007, STOC '07.
[122] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[123] R.D. Falgout,et al. An Introduction to Algebraic Multigrid Computing , 2006, Computing in Science & Engineering.
[124] Raphael Yuster,et al. Finding heaviest H-subgraphs in real weighted graphs, with applications , 2006, TALG.
[125] Michael A. Heroux,et al. PyTrilinos: High-performance distributed-memory solvers for Python , 2006, TOMS.
[126] John R. Gilbert,et al. High-Performance Graph Algorithms from Parallel Sparse Matrices , 2006, PARA.
[127] Haim Kaplan,et al. Colored intersection searching via sparse rectangular matrix multiplication , 2006, SCG '06.
[128] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[129] Raphael Yuster,et al. Fast sparse matrix multiplication , 2004, TALG.
[130] Sotirios G. Ziavras,et al. A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters , 2004, IEICE Trans. Inf. Syst..
[131] William L. Briggs,et al. A multigrid tutorial, Second Edition , 2000 .
[132] Edith Cohen,et al. Structure Prediction and Computation of Sparse Matrix Products , 1998, J. Comb. Optim..
[133] Edith Cohen,et al. Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..
[134] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[135] Guy E. Blelloch,et al. AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors , 1993 .
[136] Vijay V. Vazirani,et al. Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.
[137] Fred G. Gustavson,et al. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.
[138] I. Duff. A survey of sparse matrix research , 1977, Proceedings of the IEEE.
[139] Cevdet Aykanat,et al. Locality-aware and load-balanced static task scheduling for MapReduce , 2019, Future Gener. Comput. Syst..
[140] T. Davis. Algorithm 1000: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra , 2019, ACM Trans. Math. Softw..
[141] Mehmet Deveci,et al. Sparse Matrix-Matrix Multiplication for Modern Architectures , 2016 .
[142] Ernest Jamro,et al. The Algorithms for FPGA Implementation of Sparse Matrices Multiplication , 2014, Comput. Informatics.
[143] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[144] Igor L. Markov,et al. Hypergraph Partitioning , 2011, Encyclopedia of Parallel Computing.
[145] Joachim Georgii,et al. A STREAMING APPROACH FOR SPARSE MATRIX PRODUCTS AND ITS APPLICATION IN GALERKIN MULTIGRID METHODS , 2010 .
[146] James Reinders,et al. Intel® threading building blocks , 2008 .
[147] R. Falgout. An Introduction to Algebraic Multigrid , 2006, Comput. Sci. Eng..
[148] Barbara Kitchenham,et al. Procedures for Performing Systematic Reviews , 2004 .
[149] John R. Gilbert,et al. Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..
[150] William L. Briggs,et al. A multigrid tutorial , 1987 .
[151] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .