Memory reduction method for deep neural network training

Training deep neural networks requires a large amount of memory, making very deep neural networks difficult to fit on accelerator memories. In order to overcome this limitation, we present a method to reduce the amount of memory for training a deep neural network. The method enables to suppress memory increase during the backward pass, by reusing the memory regions allocated for the forward pass. Experimental results exhibit our method reduced the occupied memory size in training by 44.7% on VGGNet with no accuracy affection. Our method also enabled training speedup by increasing the mini batch size up to double.

[1]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[5]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[11]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[12]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[13]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[14]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).