论文信息 - A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree

A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree

Recent works on mobile deep-learning processors have presented designs that exploit sparsity [2, 3], which is commonly found in various neural networks. However, due to the shift in the machine learning community towards using non-sparse activation functions such as Leaky ReLU or Swish for better training convergence, state-of-theart models no longer exhibit the sparsity found in conventional ReLU-based models (Fig. 9.3.1, top). Moreover, contrary to error-tolerant image classification tasks, more difficult tasks such as image super-resolution require higher precision than plain 8b integers not just for training, but for inference without large accuracy degradation (Fig. 9.3.1, bottom). These changes offer new challenges faced by mobile deep-learning processors: they must process non-sparse networks efficiently and maintain higher precision for more challenging tasks.

[1] Hoi-Jun Yoo,et al. 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[2] Hoi-Jun Yoo,et al. 7.4 GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[3] Youngwoo Kim,et al. A 2.1TFLOPS/W Mobile Deep RL Accelerator with Transposable PE Array and Experience Compression , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[4] Swagath Venkataramani,et al. A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference , 2020, 2020 IEEE Symposium on VLSI Circuits.

[5] Joel Silberman,et al. A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference , 2018, 2018 IEEE Symposium on VLSI Circuits.

[6] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.