Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling
暂无分享,去创建一个
Christopher W. Fletcher | Hadi Asghari Moghaddam | J. Emer | John Douglas Owens | A. Jaleel | Edgar Solomonik | Michael Pellauer | N. Crago | Po-An Tsai | Kartik Hegde | Toluwanimi O. Odemuyiwa
[1] Christopher W. Fletcher,et al. Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract) , 2023, HOPC@SPAA.
[2] Ümit V. Çatalyürek,et al. On Symmetric Rectilinear Partitioning , 2022, ACM J. Exp. Algorithmics.
[3] Jaehyuk Huh,et al. InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-Aware Inner Product Processing , 2021, 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[4] J. Emer,et al. Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication , 2021, International Conference on Architectural Support for Programming Languages and Operating Systems.
[5] Süreyya Emre Kurt,et al. Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Nitish Srivastava,et al. MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[7] Shao-Yi Chien,et al. GrateTile: Efficient Sparse Tensor Tiling for CNN Processing , 2020, 2020 IEEE Workshop on Signal Processing Systems (SiPS).
[8] Erich Elsen,et al. Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] E. Boman,et al. On Optimal Partitioning For Sparse Matrices In Variable Block Row Format , 2020, ArXiv.
[10] Bahar Asgari,et al. ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[11] Song Han,et al. SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[12] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] Nitish Srivastava,et al. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Ariful Azad,et al. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors , 2019, Parallel Comput..
[15] Donghyuk Lee,et al. Near-memory data transformation for efficient sparse matrix multi-vector multiplication , 2019, SC.
[16] Gunnar Rätsch,et al. Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons , 2019, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[17] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[18] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[19] Nathan Beckmann,et al. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates , 2019, MICRO.
[20] Tze Meng Low,et al. Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization , 2019, MICRO.
[21] T. N. Vijaykumar,et al. SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks , 2019, MICRO.
[22] Xiaolan Liu,et al. A Sequentially Truncated Higher Order Singular Value Decomposition-Based Algorithm for Tensor Completion , 2019, IEEE Transactions on Cybernetics.
[23] Jason Clemons,et al. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration , 2019, ASPLOS.
[24] P. Sadayappan,et al. Adaptive sparse tiling for sparse matrix multiplication , 2019, PPoPP.
[25] Katherine Yelick,et al. BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper , 2018, bioRxiv.
[26] Christopher W. Fletcher,et al. Morph: Flexible Acceleration for 3D CNN-Based Video Understanding , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[28] Mengjia Yan,et al. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[29] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[30] John D. Owens,et al. Design Principles for Sparse Matrix Multiplication on the GPU , 2018, Euro-Par.
[31] Georgios A. Pavlopoulos,et al. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks , 2018, Nucleic acids research.
[32] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[33] Hans-Peter Seidel,et al. Globally homogeneous, locally adaptive sparse matrix-vector multiplication on the GPU , 2017, ICS.
[34] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[35] Torsten Hoefler,et al. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication , 2016, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Vivienne Sze,et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[37] Michael Garland,et al. Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format , 2016, PPoPP.
[38] Torsten Hoefler,et al. Sparse Tensor Algebra as a Parallel Programming Model , 2015, ArXiv.
[39] Tamara G. Kolda,et al. Parallel Tensor Compression for Large-Scale Scientific Data , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[40] John R. Gilbert,et al. Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[41] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[42] Austin R. Benson,et al. A framework for practical parallel fast matrix multiplication , 2014, PPoPP.
[43] Michael Stonebraker,et al. Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[44] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[45] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[46] Daniel Kats,et al. Sparse tensor framework for implementation of general local correlation methods. , 2013, The Journal of chemical physics.
[47] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[48] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[49] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[50] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.
[51] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[52] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.
[53] Larry Carter,et al. Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..
[54] S. Dongen. Graph clustering by flow simulation , 2000 .
[55] A. Einstein. The Foundation of the General Theory of Relativity , 1916 .
[56] A. Einstein,et al. Die Grundlage der allgemeinen Relativitätstheorie , 1916 .