eSRCNN: A Framework for Optimizing Super-Resolution Tasks on Diverse Embedded CNN Accelerators

CNN-based Super-Resolution (SR), the most representative of low-level vision task, is a promising solution to improve users' QoS on IoT devices that suffer from limited network bandwidth and storage capacity by effectively enhancing image/video resolution. Although prior accelerators to embed CNN show tremendous performance and energy efficiency, they are not suitable for SR tasks regarding off-chip memory accesses. In this work, we present eSRCNN, a framework that enables performing energy-efficient SR tasks on diverse embedded CNN accelerators by decreasing off-chip memory accesses. To reduce off-chip memory accesses, our framework consists of three steps: a network reformation using a cross-layer weight scaling, a precision minimization with priority-based quantization, and an activation map compression exploiting a data locality. As a result, the energy consumption of off-chip memory accesses is reduced up to 71.89% with less than 3.52% area overhead.

[1]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[2]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Wuzhen Shi,et al.  An End-to-End Compression Framework Based on Convolutional Neural Networks , 2017, 2017 Data Compression Conference (DCC).

[4]  Touradj Ebrahimi,et al.  The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[5]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[6]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[7]  Charbel Sakr,et al.  Analytical Guarantees on Numerical Precision of Deep Neural Networks , 2017, ICML.

[8]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[9]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[10]  Liang-Gee Chen,et al.  Accurate and Bandwidth Efficient Architecture for CNN-based Full-HD Super-Resolution , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[11]  Swagath Venkataramani,et al.  Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[12]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[13]  Xiaoou Tang,et al.  Accelerating the Super-Resolution Convolutional Neural Network , 2016, ECCV.

[14]  Moon Gi Kang,et al.  Super-resolution image reconstruction: a technical overview , 2003, IEEE Signal Process. Mag..

[15]  Jiajun Li,et al.  SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Suk-Ju Kang,et al.  Optimizing FPGA-based convolutional neural networks accelerator for image super-resolution , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[17]  Shaoli Liu,et al.  Cambricon-X: An accelerator for sparse neural networks , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Christian Ledig,et al.  Is the deconvolution layer the same as a convolutional layer? , 2016, ArXiv.

[23]  Stephen W. Keckler,et al.  Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks , 2017, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[24]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[25]  Soheil Ghiasi,et al.  Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.