论文信息 - CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters. However, there are many applications such as smart surveillance cameras that require or would benefit from on-site processing. To this end, we propose and evaluate a novel algorithm for change-based evaluation of CNNs for video data recorded with a static camera setting, exploiting the spatio-temporal sparsity of pixel changes. We achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic benchmark with a negligible accuracy loss of less than 0.1% and no retraining of the network. The resulting energy efficiency is 10x higher than that of per-frame evaluation and reaches an equivalent of 328 GOp/s/W on the Tegra X1 platform.

[1] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[2] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[5] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[6] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7] Berin Martini,et al. NeuFlow: A runtime reconfigurable dataflow processor for vision , 2011, CVPR 2011 WORKSHOPS.

[8] Andrew Lavin,et al. maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs , 2015, ArXiv.

[9] Luca Benini,et al. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[10] Luca Benini,et al. Origami: A 803-GOp/s/W Convolutional Network Accelerator , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[11] Alessandro Aimar,et al. NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[13] Vivienne Sze,et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[14] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[15] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Sherief Reda,et al. Understanding the impact of precision quantization on the accuracy and energy of neural networks , 2016, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[17] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[19] Qiang Xu,et al. ApproxANN: An approximate computing framework for artificial neural network , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Silvio Savarese,et al. Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[22] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Fan Yang,et al. Fast and Robust Face Detection on a Parallel Optimized Architecture Implemented on FPGA , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[24] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[25] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[27] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Michele Magno,et al. Computationally efficient target classification in multispectral image data with Deep Neural Networks , 2016, Security + Defence.

[30] Trevor Darrell,et al. Clockwork Convnets for Video Semantic Segmentation , 2016, ECCV Workshops.

[31] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Michele Magno,et al. Accelerating real-time embedded scene labeling with convolutional networks , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[34] Qingshan Liu,et al. Robust Visual Tracking via Convolutional Networks Without Training , 2015, IEEE Transactions on Image Processing.

[35] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.

[37] Luca Benini,et al. Origami: A Convolutional Network Accelerator , 2015, ACM Great Lakes Symposium on VLSI.

[38] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Munchurl Kim,et al. CNN-based in-loop filtering for coding efficiency improvement , 2016, 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).

[40] Andrew Zisserman,et al. Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[41] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[42] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[43] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[44] Berin Martini,et al. An efficient implementation of deep convolutional neural networks on a mobile coprocessor , 2014, 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS).

[45] Herman Kruegle. Chapter 20 – Video Technology Overview∗ , 2017 .

[46] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47] Sharath Pankanti,et al. Video surveillance: past, present, and now the future [DSP Forum] , 2013, IEEE Signal Processing Magazine.

[48] Herman Kruegle. Chapter 18 – New Technology , 1995 .