SpArch: Efficient Architecture for Sparse Matrix Multiplication
暂无分享,去创建一个
Song Han | William J. Dally | Zhekai Zhang | Hanrui Wang | Song Han | W. Dally | Hanrui Wang | Zhekai Zhang
[1] Gerald Penn,et al. Efficient transitive closure of sparse matrices over closed semirings , 2006, Theor. Comput. Sci..
[2] J. Hennessy. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[3] Gang Wang,et al. Fast lists intersection with Bloom filter using graphics processing units , 2011, SAC '11.
[4] Brian Vinter,et al. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[5] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[6] Yong Dou,et al. High performance sparse matrix-vector multiplication on FPGA , 2013, IEICE Electron. Express.
[7] Mehmet Deveci,et al. Sparse Matrix-Matrix Multiplication for Modern Architectures , 2016 .
[8] Yuan Xie,et al. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs , 2019, MICRO.
[9] Santiago Badia,et al. A Highly Scalable Parallel Implementation of Balancing Domain Decomposition by Constraints , 2014, SIAM J. Sci. Comput..
[10] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[11] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[12] Jeff Rearick,et al. Unleashing Fury: A New Paradigm for 3-D Design and Test , 2017, IEEE Design & Test.
[13] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[14] Joan Antoni Sellarès,et al. Intersecting two families of sets on the GPU , 2017, J. Parallel Distributed Comput..
[15] Satoshi Itoh,et al. Order-N tight-binding molecular dynamics on parallel computers , 1995 .
[16] Ngai Wong,et al. Design space exploration for sparse matrix-matrix multiplication on FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.
[17] Christopher Robert Cullinan,et al. Computing Performance Benchmarks among CPU, GPU, and FPGA , 2012 .
[18] Warren J. Gross,et al. FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method , 2008, Comput. Phys. Commun..
[19] Jason Cong,et al. Understanding Performance Differences of FPGAs and GPUs , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[20] John R. Gilbert,et al. A Unified Framework for Numerical and Combinatorial Computing , 2008, Computing in Science & Engineering.
[21] H. T. Kung. Why systolic architectures? , 1982, Computer.
[22] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[23] Conrad Sanderson,et al. Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation , 2018 .
[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[25] Conrad Sanderson,et al. Armadillo: a template-based C++ library for linear algebra , 2016, J. Open Source Softw..
[26] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[27] Mark Horowitz,et al. Energy-Efficient Floating-Point Unit Design , 2011, IEEE Transactions on Computers.
[28] Lei Zou,et al. Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.
[29] Samuel Williams,et al. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication , 2015, SIAM J. Sci. Comput..
[30] Ernest Jamro,et al. The Algorithms for FPGA Implementation of Sparse Matrices Multiplication , 2014, Comput. Informatics.
[31] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[32] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.
[33] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[34] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.
[35] Wayne Luk,et al. Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).
[36] Vijay V. Vazirani,et al. Maximum Matchings in General Graphs Through Randomization , 1989, J. Algorithms.
[37] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[38] Nectarios Koziris,et al. Understanding the Performance of Sparse Matrix-Vector Multiplication , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[39] S. Dongen. Graph clustering by flow simulation , 2000 .
[40] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[41] Fred G. Gustavson,et al. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.
[42] Ichitaro Yamazaki,et al. On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver , 2010, VECPAR.
[43] John R. Gilbert,et al. Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[44] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[45] Gang Wang,et al. Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..
[46] Vipin Kumar,et al. A parallel formulation of interior point algorithms , 1994, Proceedings of Supercomputing '94.
[47] Viktor K. Prasanna,et al. Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.
[48] Song Han,et al. Learning to Design Circuits , 2018, ArXiv.
[49] Timothy M. Chan. More algorithms for all-pairs shortest paths in weighted graphs , 2007, STOC '07.
[50] Tim Kraska,et al. Park: An Open Platform for Learning-Augmented Computer Systems , 2019, NeurIPS.
[51] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[52] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[53] John R. Gilbert,et al. High-Performance Graph Algorithms from Parallel Sparse Matrices , 2006, PARA.
[54] Tsutomu Maruyama,et al. Performance comparison of FPGA, GPU and CPU in image processing , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[55] Luke N. Olson,et al. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods , 2012, SIAM J. Sci. Comput..