Low-memory convolutional neural networks through incremental depth-first processing

We introduce an incremental processing scheme for convolutional neural network (CNN) inference, targeted at embedded applications with limited memory budgets. Instead of processing layers one by one, individual input pixels are propagated through all parts of the network they can influence under the given structural constraints. This depth-first updating scheme comes with hard bounds on the memory footprint: the memory required is constant in the case of 1D input and proportional to the square root of the input dimension in the case of 2D input.

[1]  Florian Schmidt,et al.  BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism , 2018, ArXiv.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Mathieu Salzmann,et al.  Compression-aware Training of Deep Networks , 2017, NIPS.

[4]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[5]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[6]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Farinaz Koushanfar,et al.  Deep3: Leveraging three levels of parallelism for efficient Deep Learning , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Giacomo Indiveri,et al.  Deep counter networks for asynchronous event-based processing , 2016, ArXiv.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Victor S. Lempitsky,et al.  Fast ConvNets Using Group-Wise Brain Damage , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Max Welling,et al.  Temporally Efficient Deep Learning with Spikes , 2018, ICLR.

[14]  Manoj Alwani,et al.  Fused-layer CNN accelerators , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[17]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[19]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[20]  Jungwon Lee,et al.  Universal Deep Neural Network Compression , 2018, IEEE Journal of Selected Topics in Signal Processing.

[21]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[22]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[23]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[24]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Daniel Brand,et al.  MEC: Memory-efficient Convolution for Deep Neural Network , 2017, ICML.