PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing
暂无分享,去创建一个
Leibo Liu | S. Yin | Shaojun Wei | Yang Wang | Dazheng Deng | Shouyi Yin
[1] Lee-Sup Kim,et al. A Deep Neural Network Training Architecture With Inference-Aware Heterogeneous Data-Type , 2022, IEEE Transactions on Computers.
[2] Sunwoo Lee,et al. A Neural Network Training Processor With 8-Bit Shared Exponent Bias Floating Point and Multiple-Way Fused Multiply-Add Trees , 2022, IEEE Journal of Solid-State Circuits.
[3] Christopher De Sa,et al. How Low Can We Go: Trading Memory for Error in Low-Precision Training , 2021, ICLR.
[4] Yang Wang,et al. A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation , 2021, 2021 Symposium on VLSI Circuits.
[5] Joel Silberman,et al. A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling , 2021, 2021 IEEE International Solid- State Circuits Conference (ISSCC).
[6] Longxing Shi,et al. A 22nm, 10.8 μ W/15.1 μ W Dual Computing Modes High Power-Performance-Area Efficiency Domained Background Noise Aware Keyword- Spotting Processor , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.
[7] Jack Choquette,et al. NVIDIA A100 GPU: Performance & Innovation for GPU Computing , 2020, 2020 IEEE Hot Chips 32 Symposium (HCS).
[8] Seungkyu Choi,et al. An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices , 2020, IEEE Journal of Solid-State Circuits.
[9] Swagath Venkataramani,et al. A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference , 2020, 2020 IEEE Symposium on VLSI Circuits.
[10] Jun Lin,et al. Evaluations on Deep Neural Networks Training Using Posit Number System , 2020, IEEE Transactions on Computers.
[11] Hoi-Jun Yoo,et al. 7.4 GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).
[12] Mehdi Kamal,et al. Res-DNN: A Residue Number System-Based DNN Accelerator Unit , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.
[13] Anahita Bhiwandiwalla,et al. Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks , 2020, ICLR.
[14] Florent de Dinechin,et al. Evaluating the Hardware Cost of the Posit Number System , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).
[15] Dhireesha Kudithipudi,et al. Cheetah: Mixed Low-Precision Hardware & Software Co-Design Framework for DNNs on the Edge , 2019, ArXiv.
[16] Christopher De Sa,et al. SWALP : Stochastic Weight Averaging in Low-Precision Training , 2019, ICML.
[17] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2019, ICLR.
[18] Yann LeCun,et al. 1.1 Deep Learning Hardware: Past, Present, and Future , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[19] Hoi-Jun Yoo,et al. 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[20] Youngwoo Kim,et al. A 2.1TFLOPS/W Mobile Deep RL Accelerator with Transposable PE Array and Experience Compression , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[21] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[22] Jeff Johnson. Rethinking floating point for deep learning , 2018, ArXiv.
[23] Lee-Sup Kim,et al. TrainWare: A Memory Optimized Weight Update Architecture for On-Device Convolutional Neural Network Training , 2018, ISLPED.
[24] Vivienne Sze,et al. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.
[25] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[26] Jing Wang,et al. In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[27] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[28] John L. Gustafson,et al. Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..
[29] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[30] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[31] Tianshi Chen,et al. ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[32] Trevor Darrell,et al. Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Dumitru Erhan,et al. Deep Neural Networks for Object Detection , 2013, NIPS.
[34] Jean-Michel Muller,et al. Floating-point arithmetic , 2023, Acta Numerica.
[35] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[36] Jirí Kadlec,et al. Arithmetic on the European Logarithmic Microprocessor , 2000, IEEE Trans. Computers.
[37] Nicholas J. Higham,et al. The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..
[38] Jeffrey Scott Vitter,et al. Design and analysis of dynamic Huffman codes , 1987, JACM.
[39] F. Taylor,et al. An extended precision logarithmic number system , 1983 .
[40] Robert F. Rice,et al. Some practical universal noiseless coding techniques , 1979 .
[41] Duncan H. Lawrie,et al. Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.
[42] Solomon W. Golomb,et al. Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.
[43] Sunwoo Lee,et al. Toward Efficient Low-Precision Training: Data Format Optimization and Hysteresis Quantization , 2022, ICLR.
[44] Swagath Venkataramani,et al. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks , 2019, NeurIPS.
[45] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .