Layer Skip Learning using LARS variables for 39% Faster Conversion Time and Lower Bandwidth

In this paper, a method for the improvement of the relationship between calculation time and recognition accuracy in deep learning is proposed. A major problem with respect to deep learning is that a large calculation time is required for higher recognition accuracy. Because of this problem, the implementation of deep learning in hardware and its application to real problems are limited. In this study, layer-wise adaptive rate scaling (LARS) variables are adopted to evaluate the necessity of the learning of each layer. When the variable of a certain convolution layer exceeds the threshold value, the learning for that layer is considered unnecessary; thus, the layer is skipped. When a layer recognized as the layer that does not require learning, only the lower layers below than that layer are learned in the next epoch. By adaptively skipping the layer, the calculation time is reduced. Furthermore, the recognition accuracy is improved. Consequently, the proposed methods accelerate the calculation time in VGG-F to achieve the highest accuracy for the top1 and top5 test accuracy by a speed up factor of 2.14, and 2.25, respectively. Moreover, the respective top1 and top5 test accuracy was improved by 3.0%, and 2.8% which obtained as the final accuracy. In addition, the operation process was reduced by approximately 39.0% and required bandwidth was reduced by 38.9%, when compared with the case of conventional full layer learning.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shintaro Izumi,et al.  A layer-block-wise pipeline for memory and bandwidth reduction in distributed deep learning , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[3]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[4]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[5]  Qiang Xu,et al.  ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.