Diffy: a Déjà vu-Free Differential Deep Neural Network Accelerator

We show that Deep Convolutional Neural Network (CNN) implementations of computational imaging tasks exhibit spatially correlated values. We exploit this correlation to reduce the amount of computation, communication, and storage needed to execute such CNNs by introducing Diffy, a hardware accelerator that performs Differential Convolution. Diffy stores, communicates, and processes the bulk of the activation values as deltas. Experiments show that, over five state-of-the-art CNN models and for HD resolution inputs, Diffy boosts the average performance by 7.1× over a baseline value-agnostic accelerator [1] and by 1.41× over a state-of-the-art accelerator that processes only the effectual content of the raw activation values [2]. Further, Diffy is respectively 1.83× and 1.36× more energy efficient when considering only the on-chip energy. However, Diffy requires 55% less on-chip storage and 2.5× less off-chip bandwidth compared to storing the raw values using profiled per-layer precisions [3]. Compared to using dynamic per group precisions [4], Diffy requires 32% less storage and 1.43× less off-chip memory bandwidth. More importantly, Diffy provides the performance necessary to achieve real-time processing of HD resolution images with practical configurations. Finally, Diffy is robust and can serve as a general CNN accelerator as it improves performance even for image classification models.

[1]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[2]  Lei Zhang,et al.  Color demosaicking by local directional interpolation and nonlocal adaptive thresholding , 2011, J. Electronic Imaging.

[3]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Aline Roumy,et al.  Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding , 2012, BMVC.

[6]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[8]  Lynn Jenner Hubble’s High-Definition Panoramic View of the Andromeda Galaxy , 2015 .

[9]  Jean-Michel Morel,et al.  The Noise Clinic: a Blind Image Denoising Algorithm , 2015, Image Process. Line.

[10]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Chih-Yuan Yang,et al.  Single-Image Super-Resolution: A Benchmark , 2014, ECCV.

[13]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[14]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17]  Alan C. Bovik,et al.  A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms , 2006, IEEE Transactions on Image Processing.

[18]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[19]  Alberto Delmas,et al.  Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks , 2017, ArXiv.

[20]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[21]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization with Application to Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Felix Heide,et al.  IDEAL: Image DEnoising AcceLerator , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  J. E. Krist,et al.  Deconvolution of Hubble Space Telescope images using simulated point spread functions , 1992 .

[24]  William J. Dally,et al.  SCNN: An accelerator for compressed-sparse convolutional neural networks , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[25]  Ezzatollah Salari,et al.  A neural network‐based nonlinear filter for image enhancement , 2002, Int. J. Imaging Syst. Technol..

[26]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[27]  Natalie D. Enright Jerger,et al.  Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.

[28]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[29]  Alberto Delmas,et al.  DPRed: Making Typical Activation Values Matter In Deep Learning Computing , 2018, ArXiv.

[30]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[31]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[36]  Bernhard Schölkopf,et al.  Learning to Deblur , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Matthew Mattina,et al.  Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[39]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Kyoung Mu Lee,et al.  Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jan Kotera,et al.  Convolutional Neural Networks for Direct Text Deblurring , 2015, BMVC.

[42]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Frédo Durand,et al.  Deep joint demosaicking and denoising , 2016, ACM Trans. Graph..

[44]  Andreas Moshovos,et al.  Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[45]  Luca Benini,et al.  CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data , 2017, ICDSC.

[46]  Xuan Yang,et al.  A Systematic Approach to Blocking Convolutional Neural Networks , 2016, ArXiv.

[47]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[48]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).