Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
暂无分享,去创建一个
Anand Raghunathan | Brucek Khailany | Jacob R. Stevens | Rangharajan Venkatesan | Steve Dai | Rangharajan Venkatesan | A. Raghunathan | Steve Dai | Brucek Khailany
[1] William J. Dally,et al. MAGNet: A Modular Accelerator Generator for Neural Networks , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[2] Moshe Wasserblat,et al. Q8BERT: Quantized 8Bit BERT , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[3] Zheng Zhang,et al. Star-Transformer , 2019, NAACL.
[4] Chao Tian,et al. Efficient Softmax Hardware Architecture for Deep Neural Networks , 2019, ACM Great Lakes Symposium on VLSI.
[5] Mehdi Rezagholizadeh,et al. Fully Quantized Transformer for Machine Translation , 2020, EMNLP.
[6] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[7] Danyang Zhu,et al. Efficient Precision-Adjustable Architecture for Softmax Function in Deep Learning , 2020, IEEE Transactions on Circuits and Systems II: Express Briefs.
[8] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[9] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[12] Sanchari Sen,et al. Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models , 2020, ArXiv.
[13] Fabrizio Lombardi,et al. Design and Implementation of an Approximate Softmax Layer for Deep Neural Networks , 2020, 2020 IEEE International Symposium on Circuits and Systems (ISCAS).
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Jingbo Zhu,et al. Towards Fully 8-bit Integer Inference for the Transformer Model , 2020, IJCAI.
[16] Natalia Gimelshein,et al. Online normalizer calculation for softmax , 2018, ArXiv.
[17] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[18] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[19] Patrick Judd,et al. Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation , 2020, ArXiv.
[20] Deog-Kyoon Jeong,et al. A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.