A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree

Recent works on mobile deep-learning processors have presented designs that exploit sparsity [2, 3], which is commonly found in various neural networks. However, due to the shift in the machine learning community towards using non-sparse activation functions such as Leaky ReLU or Swish for better training convergence, state-of-theart models no longer exhibit the sparsity found in conventional ReLU-based models (Fig. 9.3.1, top). Moreover, contrary to error-tolerant image classification tasks, more difficult tasks such as image super-resolution require higher precision than plain 8b integers not just for training, but for inference without large accuracy degradation (Fig. 9.3.1, bottom). These changes offer new challenges faced by mobile deep-learning processors: they must process non-sparse networks efficiently and maintain higher precision for more challenging tasks.