A Systematic Survey of General Sparse Matrix-matrix Multiplication

General Sparse Matrix-Matrix Multiplication (SpGEMM) has attracted much attention from researchers in graph analyzing, scientific computing, and deep learning. Many optimization techniques have been developed for different applications and computing architectures over the past decades. The objective of this article is to provide a structured and comprehensive overview of the researches on SpGEMM. Existing researches have been grouped into different categories based on target architectures and design choices. Covered topics include typical applications, compression formats, general formulations, key problems and techniques, architecture-oriented optimizations, and programming models. The rationales of different algorithms are analyzed and summarized. This survey sufficiently reveals the latest progress of SpGEMM research to 2021. Moreover, a thorough performance comparison of existing implementations is presented. Based on our findings, we highlight future research directions, which encourage better design and implementations in later studies.

[1]  Jaeha Kung,et al.  AutoRelax: HW-SW Co-Optimization for Efficient SpGEMM Operations With Automated Relaxation in Deep Learning , 2022, IEEE Transactions on Emerging Topics in Computing.

[2]  Zhengyang Lu,et al.  TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs , 2022, PPoPP.

[3]  John R. Gilbert,et al.  Combinatorial BLAS 2.0: Scaling Combinatorial Algorithms on Distributed-Memory Systems , 2021, IEEE Transactions on Parallel and Distributed Systems.

[4]  Jia-Min Shieh,et al.  CiM3D: Comparator-in-Memory Designs Using Monolithic 3-D Technology for Accelerating Data-Intensive Applications , 2021, IEEE Journal on Exploratory Solid-State Computational Devices and Circuits.

[5]  Kaustubh Shivdikar,et al.  SMASH: Sparse Matrix Atomic Scratchpad Hashing , 2021, ArXiv.

[6]  Gagan Agrawal,et al.  Scaling Sparse Matrix Multiplication on CPU-GPU Nodes , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[7]  Thomas B. Rolinger,et al.  Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[8]  J. Emer,et al.  Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication , 2021, International Conference on Architectural Support for Programming Languages and Operating Systems.

[9]  Sivasankaran Rajamanickam,et al.  Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels , 2021, ArXiv.

[10]  Dipankar Das,et al.  Extending Sparse Tensor Accelerators to Support Multiple Compression Formats , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Jiajia Li,et al.  Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory , 2021, PPoPP.

[12]  Hari Sundar,et al.  A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication , 2021, HPC Asia.

[13]  Hariram Thirucherai Govindarajan,et al.  Monolithic 3D+-IC Based Massively Parallel Compute-in-Memory Macro for Accelerating Database and Machine Learning Primitives , 2020, 2020 IEEE International Electron Devices Meeting (IEDM).

[14]  Cevdet Aykanat,et al.  Cartesian Partitioning Models for 2D and 3D Parallel SpGEMM Algorithms , 2020, IEEE Transactions on Parallel and Distributed Systems.

[15]  T. Katagiri,et al.  Performance Evaluation of Accurate Matrix-Matrix Multiplication on GPU Using Sparse Matrix Multiplications , 2020, 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW).

[16]  Leonid Oliker,et al.  Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[17]  Ariful Azad,et al.  Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale , 2020, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[18]  Nitish Srivastava,et al.  MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Ariful Azad,et al.  Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Orestis Zachariadis,et al.  Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores , 2020, Comput. Electr. Eng..

[21]  Martin Herbordt,et al.  FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid Solvers , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[22]  Richard P. Martin,et al.  Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra , 2020, ArXiv.

[23]  Yongjun Park,et al.  Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[24]  Ariful Azad,et al.  Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking , 2020, SPAA.

[25]  Ariful Azad,et al.  Optimizing High Performance Markov Clustering for Pre-Exascale Architectures , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26]  Martin Winter,et al.  spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis , 2020, PPoPP.

[27]  Song Han,et al.  SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[28]  Dipankar Das,et al.  SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[29]  TIMOTHY A. DAVIS,et al.  Algorithm 1000 , 2019, ACM Transactions on Mathematical Software.

[30]  Ariful Azad,et al.  Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors , 2019, Parallel Comput..

[31]  Fugang Wang,et al.  Generalized Sparse Matrix-Matrix Multiplication for Vector Engines and Graph Applications , 2019, 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC).

[32]  Onur Mutlu,et al.  SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations , 2019, MICRO.

[33]  Aamer Jaleel,et al.  ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.

[34]  T. N. Vijaykumar,et al.  SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.

[35]  Gu-Yeon Wei,et al.  MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation , 2019, MICRO.

[36]  Jeanine Cook,et al.  MetaStrider , 2019, ACM Trans. Archit. Code Optim..

[37]  John D. Owens,et al.  Accelerating DNN Inference with GraphBLAS and the GPU , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[38]  Jiaming Xie,et al.  SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps , 2019, APPT.

[39]  Xiaoyong Du,et al.  Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors , 2019, CCF Transactions on High Performance Computing.

[40]  Kenli Li,et al.  Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer , 2019, IEEE Transactions on Parallel and Distributed Systems.

[41]  Guoqing Xiao,et al.  Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight , 2019, Neural Computing and Applications.

[42]  Hans-Peter Seidel,et al.  Adaptive sparse matrix-matrix multiplication on the GPU , 2019, PPoPP.

[43]  Cevdet Aykanat,et al.  Scaling sparse matrix-matrix multiplication in the accumulo database , 2019, Distributed and Parallel Databases.

[44]  Guangming Tan,et al.  Register-Aware Optimizations for Parallel Sparse Matrix–Matrix Multiplication , 2019, International journal of parallel programming.

[45]  Katherine Yelick,et al.  BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper , 2018, bioRxiv.

[46]  Christopher M. Siefert,et al.  Low Thread-Count Gustavson: A Multithreaded Algorithm for Sparse Matrix-Matrix Multiplication Using Perfect Hashing , 2018, 2018 IEEE/ACM 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (scalA).

[47]  Timothy A. Davis,et al.  Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[48]  Sivasankaran Rajamanickam,et al.  Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[49]  Uwe Naumann,et al.  Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures , 2018, SIAM J. Sci. Comput..

[50]  Amanda Bienz,et al.  Reducing communication in sparse solvers , 2018 .

[51]  Satoshi Matsuoka,et al.  High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures , 2018, ICPP Workshops.

[52]  Simon D. Hammond,et al.  Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments , 2018, ArXiv.

[53]  David Blaauw,et al.  OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[54]  Mehmet Deveci,et al.  Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures , 2018, Parallel Comput..

[55]  Kadir Akbudak,et al.  Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication , 2018, ACM Trans. Parallel Comput..

[56]  P. Sadayappan,et al.  Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs , 2017, 2017 IEEE 24th International Conference on High Performance Computing (HiPC).

[57]  Simon D. Hammond,et al.  Fast linear algebra-based triangle counting with KokkosKernels , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[58]  William Song,et al.  Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[59]  Alfio Lazzaro,et al.  Porting of the DBCSR Library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi Systems , 2017, PARCO.

[60]  Satoshi Matsuoka,et al.  High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[61]  Kadir Akbudak,et al.  Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures , 2017, IEEE Transactions on Parallel and Distributed Systems.

[62]  Weifeng Liu,et al.  Fast segmented sort on GPUs , 2017, ICS.

[63]  Mehmet Deveci,et al.  Performance-Portable Sparse Matrix-Matrix Multiplication for Many-Core Architectures , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[64]  Davide Barbieri,et al.  Sparse Matrix-Vector Multiplication on GPGPUs , 2017, ACM Trans. Math. Softw..

[65]  Vivek Sarkar,et al.  A survey of sparse matrix-vector multiplication performance on large matrices , 2016, ArXiv.

[66]  Yonggang Wen,et al.  Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication , 2016, ICS.

[67]  John D. Owens,et al.  A Comparative Study on Exact Triangle Counting Algorithms on the GPU , 2016, HPGP@HPDC.

[68]  Mehmet Deveci,et al.  Parallel Graph Coloring for Manycore Architectures , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[69]  Oded Schwartz,et al.  Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication , 2016, TOPC.

[70]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[71]  V. Sze,et al.  Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2016, IEEE Journal of Solid-State Circuits.

[72]  Jonathan W. Berry,et al.  A task-based linear algebra Building Blocks approach for scalable graph analytics , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[73]  Robert Strzodka,et al.  AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods , 2015, SIAM J. Sci. Comput..

[74]  Luke N. Olson,et al.  Optimizing Sparse Matrix—Matrix Multiplication for the GPU , 2015, ACM Trans. Math. Softw..

[75]  Samuel Williams,et al.  Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication , 2015, SIAM J. Sci. Comput..

[76]  Pradeep Dubey,et al.  Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms , 2015, ISC.

[77]  Jeremy Kepner,et al.  Graphulo implementation of server-side sparse matrix multiply in the Accumulo database , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[78]  Alessandro Curioni,et al.  Semiempirical Molecular Dynamics (SEMD) I: Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm for Matrices with Decay. , 2015, Journal of chemical theory and computation.

[79]  Jeremy Kepner,et al.  Graphulo: Linear Algebra Graph Kernels for NoSQL Databases , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[80]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[81]  Brian Vinter,et al.  A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors , 2015, J. Parallel Distributed Comput..

[82]  Christoph Lenzen,et al.  Algebraic methods in the congested clique , 2015, Distributed Computing.

[83]  Jonathan J. Hu,et al.  Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid , 2015, SIAM J. Sci. Comput..

[84]  Emanuel H. Rubensson,et al.  Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model , 2015, Parallel Comput..

[85]  Uwe Naumann,et al.  GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging , 2015, SIAM J. Sci. Comput..

[86]  Huy T. Vo,et al.  The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..

[87]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[88]  John F. Stanton,et al.  A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..

[89]  Kadir Akbudak,et al.  Simultaneous Input and Output Matrix Partitioning for Outer-Product-Parallel Sparse Matrix-Matrix Multiplication , 2014, SIAM J. Sci. Comput..

[90]  Michael Stonebraker,et al.  Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[91]  Brian Vinter,et al.  An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[92]  Joost VandeVondele,et al.  Sparse matrix multiplication: The distributed block-compressed sparse row library , 2014, Parallel Comput..

[93]  Rasmus Pagh,et al.  The Input/Output Complexity of Sparse Matrix Multiplication , 2014, ESA.

[94]  Mehmet Deveci,et al.  Hypergraph Sparsification and Its Application to Partitioning , 2013, 2013 42nd International Conference on Parallel Processing.

[95]  Kun-Lung Wu,et al.  Counting and Sampling Triangles from a Graph Stream , 2013, Proc. VLDB Endow..

[96]  Gustavo Alonso,et al.  Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..

[97]  James Demmel,et al.  Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.

[98]  Kiran Kumar Matam,et al.  Sparse matrix-matrix multiplication on modern architectures , 2012, 2012 19th International Conference on High Performance Computing.

[99]  Emanuel H. Rubensson,et al.  Chunks and Tasks: A programming model for parallelization of dynamic algorithms , 2012, Parallel Comput..

[100]  Luke N. Olson,et al.  Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods , 2012, SIAM J. Sci. Comput..

[101]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[102]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[103]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[104]  John R. Gilbert,et al.  Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..

[105]  Yudong Zhang,et al.  A novel algorithm for all pairs shortest path problem based on matrix multiplication and pulse coupled neural network , 2011, Digit. Signal Process..

[106]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[107]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.

[108]  Ngai Wong,et al.  Design space exploration for sparse matrix-matrix multiplication on FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[109]  Anthony K. H. Tung,et al.  On Triangulation-based Dense Neighborhood Graphs Discovery , 2010, Proc. VLDB Endow..

[110]  Sriram Krishnamoorthy,et al.  Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[111]  Rasmus Resen Amossen,et al.  Better Size Estimation for Sparse Matrix Products , 2010, Algorithmica.

[112]  John R. Gilbert,et al.  Highly Parallel Sparse Matrix-Matrix Multiplication , 2010, ArXiv.

[113]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[114]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[115]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[116]  David Eppstein,et al.  Journal of Graph Algorithms and Applications the H-index of a Graph and Its Application to Dynamic Subgraph Statistics , 2022 .

[117]  John R. Gilbert,et al.  Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[118]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[119]  John R. Gilbert,et al.  A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.

[120]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[121]  Timothy M. Chan More algorithms for all-pairs shortest paths in weighted graphs , 2007, STOC '07.

[122]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[123]  R.D. Falgout,et al.  An Introduction to Algebraic Multigrid Computing , 2006, Computing in Science & Engineering.

[124]  Raphael Yuster,et al.  Finding heaviest H-subgraphs in real weighted graphs, with applications , 2006, TALG.

[125]  Michael A. Heroux,et al.  PyTrilinos: High-performance distributed-memory solvers for Python , 2006, TOMS.

[126]  John R. Gilbert,et al.  High-Performance Graph Algorithms from Parallel Sparse Matrices , 2006, PARA.

[127]  Haim Kaplan,et al.  Colored intersection searching via sparse rectangular matrix multiplication , 2006, SCG '06.

[128]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[129]  Raphael Yuster,et al.  Fast sparse matrix multiplication , 2004, TALG.

[130]  Sotirios G. Ziavras,et al.  A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters , 2004, IEICE Trans. Inf. Syst..

[131]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[132]  Edith Cohen,et al.  Structure Prediction and Computation of Sparse Matrix Products , 1998, J. Comb. Optim..

[133]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[134]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[135]  Guy E. Blelloch,et al.  AD-A 270 601 Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors , 1993 .

[136]  Vijay V. Vazirani,et al.  Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.

[137]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.

[138]  I. Duff A survey of sparse matrix research , 1977, Proceedings of the IEEE.

[139]  Cevdet Aykanat,et al.  Locality-aware and load-balanced static task scheduling for MapReduce , 2019, Future Gener. Comput. Syst..

[140]  T. Davis Algorithm 1000: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra , 2019, ACM Trans. Math. Softw..

[141]  Mehmet Deveci,et al.  Sparse Matrix-Matrix Multiplication for Modern Architectures , 2016 .

[142]  Ernest Jamro,et al.  The Algorithms for FPGA Implementation of Sparse Matrices Multiplication , 2014, Comput. Informatics.

[143]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[144]  Igor L. Markov,et al.  Hypergraph Partitioning , 2011, Encyclopedia of Parallel Computing.

[145]  Joachim Georgii,et al.  A STREAMING APPROACH FOR SPARSE MATRIX PRODUCTS AND ITS APPLICATION IN GALERKIN MULTIGRID METHODS , 2010 .

[146]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[147]  R. Falgout An Introduction to Algebraic Multigrid , 2006, Comput. Sci. Eng..

[148]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[149]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[150]  William L. Briggs,et al.  A multigrid tutorial , 1987 .

[151]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .