TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
暂无分享,去创建一个
J. Emer | Michael Pellauer | Shubham Ugare | Toluwanimi O. Odemuyiwa | Nandeeka Nayak | Christopher Fletcher
[1] Christopher W. Fletcher,et al. Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling , 2023, ASPLOS.
[2] Jos'e L. Abell'an,et al. Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing , 2023, ASPLOS.
[3] K. Olukotun,et al. The Sparse Abstract Machine , 2022, ASPLOS.
[4] Saman P. Amarasinghe,et al. Autoscheduling for sparse tensor algebra with an asymptotic cost model , 2022, PLDI.
[5] Yannan Nellie Wu,et al. Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling , 2022, 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Rajgopal Kannan,et al. Reconfigurable Low-latency Memory System for Sparse Matricized Tensor Times Khatri-Rao Product on FPGA , 2021, 2021 IEEE High Performance Extreme Computing Conference (HPEC).
[7] José L. Abellán,et al. STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators , 2021, IEEE Computer Architecture Letters.
[8] Camila Bohle Silva,et al. SIGMA: , 2021, EL MEJOR PERIODISMO CHILENO 2020.
[9] James Demmel,et al. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[10] J. Emer,et al. Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication , 2021, International Conference on Architectural Support for Programming Languages and Operating Systems.
[11] Vikas Chandra,et al. Mind mappings: enabling efficient algorithm-accelerator mapping space search , 2021, ASPLOS.
[12] Vivienne Sze,et al. Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[13] Shoaib Kamil,et al. A sparse iteration space transformation framework for sparse tensor algebra , 2020, Proc. ACM Program. Lang..
[14] Nitish Srivastava,et al. MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] Andreas Moshovos,et al. TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Marian Verhelst,et al. ZigZag: A Memory-Centric Rapid DNN Accelerator Design Space Exploration Framework , 2020, ArXiv.
[17] V. Sze,et al. Efficient Processing of Deep Neural Networks , 2020, Synthesis Lectures on Computer Architecture.
[18] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[19] Nitish Srivastava,et al. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[20] Song Han,et al. SpArch: Efficient Architecture for Sparse Matrix Multiplication , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Ariful Azad,et al. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors , 2019, Parallel Comput..
[22] Vivienne Sze,et al. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[23] Aamer Jaleel,et al. ExTensor: An Accelerator for Sparse Tensor Algebra , 2019, MICRO.
[24] Jason Clemons,et al. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration , 2019, ASPLOS.
[25] Brucek Khailany,et al. Timeloop: A Systematic Approach to DNN Accelerator Evaluation , 2019, 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[26] Mingyu Gao,et al. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators , 2018, ASPLOS.
[27] Xing Liu,et al. Blocking Optimization Techniques for Sparse Tensor Computation , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[28] Hyoukjun Kwon,et al. MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators , 2018, ArXiv.
[29] Saman P. Amarasinghe,et al. Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..
[30] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[31] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[32] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..
[33] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[34] Xiaowei Li,et al. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[35] Patrick Seewald,et al. Large-Scale Cubic-Scaling Random Phase Approximation Correlation Energy Calculations Using a Gaussian Basis. , 2016, Journal of chemical theory and computation.
[36] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[37] Torsten Hoefler,et al. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication , 2016, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[38] Natalie D. Enright Jerger,et al. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[39] Vivienne Sze,et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[40] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[41] John R. Gilbert,et al. Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[42] Nikos D. Sidiropoulos,et al. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[43] Pradeep Dubey,et al. GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..
[44] Michael Stonebraker,et al. Standards for graph algorithm primitives , 2014, 2013 IEEE High Performance Extreme Computing Conference (HPEC).
[45] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .
[46] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[47] Rasmus Pagh,et al. Fast and scalable polynomial kernels via explicit feature maps , 2013, KDD.
[48] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.
[49] Joost VandeVondele,et al. Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase. , 2012, Journal of chemical theory and computation.
[50] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[51] 홍원식. Performance , 2005 .
[52] S. Dongen. Graph clustering by flow simulation , 2000 .
[53] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[54] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[55] Joost VandeVondele,et al. cp2k: atomistic simulations of condensed matter systems , 2014 .
[56] Trevor Mudge,et al. Notes on Calculating Computer Performance , 1995 .
[57] A. Einstein. The Foundation of the General Theory of Relativity , 1916 .