Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture
暂无分享,去创建一个
Hangrui Bi | Peng Li | Tao Wang | Yun Liang | Liqiang Lu | Yicheng Jin | Zizhang Luo | Tao Wang | Yicheng Jin | Liqiang Lu | Yun Liang | Peng Li | Hangrui Bi | Zizhang Luo
[1] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[2] Wencong Xiao,et al. SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[4] Jure Leskovec,et al. Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.
[5] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[6] Minyi Guo,et al. Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[8] Xuehai Qian,et al. HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[9] Xuehai Zhou,et al. PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.
[10] Danyang Zhu,et al. A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning , 2018, 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).
[11] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[12] Nitish Srivastava,et al. Tensaurus: A Versatile Accelerator for Mixed Sparse-Dense Tensor Computations , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[13] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[14] Guokun Lai,et al. Large-scale Cloze Test Dataset Created by Teachers , 2017, EMNLP.
[15] Yun Liang,et al. SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).
[16] Onur Mutlu,et al. SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations , 2019, MICRO.
[17] Yun Liang,et al. An Efficient Hardware Design for Accelerating Sparse CNNs with NAS-based Models , 2021 .
[18] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[19] Vivienne Sze,et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks , 2017, IEEE Journal of Solid-State Circuits.
[20] John Wawrzynek,et al. Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.
[21] Chunhua Deng,et al. PermDNN: Efficient Compressed DNN Architecture with Permuted Diagonal Matrices , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Ji Li,et al. FTRANS: energy-efficient acceleration of transformers using FPGA , 2020, ISLPED.
[23] William J. Dally,et al. SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[24] Shaoli Liu,et al. Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Liqiang Lu,et al. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).
[26] Patrick Judd,et al. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks , 2019, ASPLOS.
[27] George Karypis,et al. Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.
[28] Jason Cong,et al. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-centric Notation , 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[29] Christoforos E. Kozyrakis,et al. Convolution engine: balancing efficiency & flexibility in specialized computing , 2013, ISCA.
[30] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.
[31] Yingming Li,et al. Fine-tune BERT with Sparse Self-Attention Mechanism , 2019, EMNLP.
[32] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[35] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[36] Patrick Judd,et al. ShapeShifter: Enabling Fine-Grain Data Width Adaptation in Deep Learning , 2019, MICRO.
[37] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[38] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[39] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[40] Yuan Xie,et al. Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs , 2019, MICRO.
[41] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[42] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[43] H. T. Kung,et al. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization , 2018, ASPLOS.
[44] H. T. Kung,et al. Maestro: A Memory-on-Logic Architecture for Coordinated Parallel Use of Many Systolic Arrays , 2019, 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP).
[45] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[46] Liu Yang,et al. Sparse Sinkhorn Attention , 2020, ICML.
[47] Dipankar Das,et al. SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[48] Hyoukjun Kwon,et al. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects , 2018, ASPLOS.
[49] James Bennett,et al. The Netflix Prize , 2007 .
[50] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[51] Zhiru Zhang,et al. Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating , 2019, MICRO.
[52] Yun Liang,et al. OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs , 2021, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[53] Tao Li,et al. Prediction Based Execution on Deep Neural Networks , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[54] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[55] Ninghui Sun,et al. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.
[56] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[57] Hanrui Wang,et al. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning , 2020, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[58] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[59] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[60] David Blaauw,et al. OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[61] Deog-Kyoon Jeong,et al. A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[62] Tianshi Chen,et al. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[63] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[64] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[65] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[66] Chia-Lin Yang,et al. Sparse ReRAM Engine: Joint Exploration of Activation and Weight Sparsity in Compressed Neural Networks , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).
[67] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.