A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree
暂无分享,去创建一个
[1] Hoi-Jun Yoo,et al. 7.7 LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16 , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[2] Hoi-Jun Yoo,et al. 7.4 GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).
[3] Youngwoo Kim,et al. A 2.1TFLOPS/W Mobile Deep RL Accelerator with Transposable PE Array and Experience Compression , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).
[4] Swagath Venkataramani,et al. A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference , 2020, 2020 IEEE Symposium on VLSI Circuits.
[5] Joel Silberman,et al. A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference , 2018, 2018 IEEE Symposium on VLSI Circuits.
[6] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.