22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management
暂无分享,去创建一个
Alexander M. Rush | Kenneth E. Shepard | Gu-Yeon Wei | D. Brooks | P. Whatmough | L. Carloni | Davide Giri | Tianyu Jia | Thierry Tambe | J. Zhang | Coleman Hooper | Joseph Zuckerman | Maico Cassel dos Santos | Erik Jens Loscalzo
[1] Alexander M. Rush,et al. A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs , 2023, IEEE Journal of Solid-State Circuits.
[2] C. T. Gray,et al. A 17–95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm , 2022, 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits).
[3] Jingchuang Wei,et al. A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing , 2022, 2022 IEEE International Solid- State Circuits Conference (ISSCC).
[4] Alexander M. Rush,et al. EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference , 2020, MICRO.
[5] Visar Berisha,et al. An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition , 2020, IEEE Journal of Solid-State Circuits.
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.