A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix–Matrix Multiplication Accelerator
暂无分享,去创建一个
Trevor Mudge | David Blaauw | Ronald G. Dreslinski | Hun-Seok Kim | Siying Feng | Paul Gao | Jielun Tan | Austin Rovinski | Shaolin Xie | Aporva Amarnath | Timothy Wesley | Jonathan Beaumont | Kuan-Yu Chen | Chun Zhao | Chaitali Chakrabarti | Michael Taylor | Subhankar Pal | Dong-Hyeon Park
[1] Gene Poole,et al. Accelerating the ANSYS Direct Sparse Solver with GPUs , 2011 .
[2] Constantine Bekas,et al. Analyzing the energy-efficiency of sparse matrix multiplication on heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[3] Ngai Wong,et al. Design space exploration for sparse matrix-matrix multiplication on FPGAs , 2010, FPT.
[4] John R. Gilbert,et al. An interactive system for combinatorial scientific computing with an emphasis on programmer productivity , 2007 .
[5] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.
[6] Francisco Vázquez,et al. Fast Sparse Matrix Matrix Product Based on ELLR-T and GPU Computing , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.
[7] DaltonSteven,et al. Optimizing Sparse MatrixMatrix Multiplication for the GPU , 2015 .
[8] John R. Gilbert,et al. A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.
[9] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[10] John R. Gilbert,et al. Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[11] Richard Dorrance,et al. A 190GFLOPS/W DSP for energy-efficient sparse-BLAS in embedded IoT , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).
[12] Ümit V. Çatalyürek,et al. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.
[13] Haim Kaplan,et al. Colored intersection searching via sparse rectangular matrix multiplication , 2006, SCG '06.
[14] John R. Gilbert,et al. High-Performance Graph Algorithms from Parallel Sparse Matrices , 2006, PARA.
[15] Bülent Yener,et al. Graph Theoretic and Spectral Analysis of Enron Email Data , 2005, Comput. Math. Organ. Theory.
[16] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[17] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[18] Sudhakar Yalamanchili,et al. Power Modeling for GPU Architectures Using McPAT , 2014, TODE.
[19] Vaclav Hapla,et al. Use of Direct Solvers in TFETI Massively Parallel Implementation , 2012, PARA.
[20] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.
[21] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[22] David Blaauw,et al. A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS , 2012, 2012 IEEE International Solid-State Circuits Conference.
[23] S. Dongen. Graph clustering by flow simulation , 2000 .
[24] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[25] Luke N. Olson,et al. Optimizing Sparse Matrix—Matrix Multiplication for the GPU , 2015, ACM Trans. Math. Softw..
[26] Philip Heng Wai Leong,et al. A Model for Matrix Multiplication Performance on FPGAs , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.
[27] Sanu Mathew,et al. 2.9TOPS/W Reconfigurable Dense/Sparse Matrix-Multiply Accelerator with Unified INT8/INTI6/FP16 Datapath in 14NM Tri-Gate CMOS , 2018, 2018 IEEE Symposium on VLSI Circuits.
[28] Gerald Penn,et al. Efficient transitive closure of sparse matrices over closed semirings , 2006, Theor. Comput. Sci..
[29] Trevor Mudge,et al. 1 A 4 . 5 Tb / s 3 . 4 Tb / s / W 64 × 64 Switch Fabric with Self-Updating Least-Recently-Granted Priority and Quality-of-Service Arbitration in 45 nm CMOS , 2018 .