Memory Requirements for Convolutional Neural Network Hardware Accelerators

The rapid pace and successful application of machine learning research and development has seen widespread deployment of deep convolutional neural networks (CNNs). Alongside these algorithmic efforts, the compute- and memory-intensive nature of CNNs has stimulated a large amount of work in the field of hardware acceleration for these networks. In this paper, we profile the memory requirements of CNNs in terms of both on-chip memory size and off-chip memory bandwidth, in order to understand the impact of the memory system on accelerator design. We show that there are fundamental tradeoffs between performance, bandwidth, and on-chip memory. Further, this paper explores how the wide variety of CNNs for different application domains each have fundamentally different characteristics. We show that bandwidth and memory requirements for different networks, and occasionally for different layers within a network, can each vary by multiple orders of magnitude. This makes designing fast and efficient hardware for all CNN applications difficult. To remedy this, we outline heuristic design points that attempt to optimize for select dataflow scenarios.

[1]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[2]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[3]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[4]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[5]  Frédo Durand,et al.  Deep joint demosaicking and denoising , 2016, ACM Trans. Graph..

[6]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[7]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[8]  Xuan Yang,et al.  A Systematic Approach to Blocking Convolutional Neural Networks , 2016, ArXiv.

[9]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[15]  Eugenio Culurciello,et al.  Hardware accelerators for recurrent neural networks on FPGA , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[19]  Alberto Delmas,et al.  Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How , 2018, ArXiv.

[20]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[22]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Natalie D. Enright Jerger,et al.  Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.

[25]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[26]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Kyunghyun Cho,et al.  High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks , 2017, ArXiv.

[29]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Roberto Cipolla,et al.  MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving , 2016, 2018 IEEE Intelligent Vehicles Symposium (IV).

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.