Real-Time Deep Video Analytics on Mobile Devices

Real-time mobile video analytics plays an increasingly important role in our daily life, such as smart driving, unmanned delivery, cashier free stores, and video surveillance. The existing video analytics runs complex deep models to detect and recognize objects in video frames. However, running deep models on mobile devices can not meet the real-time requirement. This paper develops a novel mobile video analytics system. Its unique features include (i) high accuracy, (ii) real-time, and (iii) running exclusively on a mobile device without the need of edge/cloud server or network connectivity. At its heart lies an effective technique to reliably extract motion from video frames and use the motion to speed up video analytics. Unlike the existing motion extraction, our technique is robust to background noise and changes in object sizes. Extensive evaluation results show that we can support real-time object tracking at 30 frames/second (fps) on Nvidia Jetson TX2. For single-object tracking, Sight improves the average Intersection-over-Union (IoU) by 88%, improves the mean Average Precision (mAP) by 207% and reduces the average hardware resource usage by 45% over state-of-the-art approach. For multi-object tracking, Sight improves IoU by 69%, improves mAP by 173% and reduces resource usage by around 32% over state-of-the-art approach.

[1]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[2]  Paramvir Bahl,et al.  Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices , 2015, SenSys.

[3]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[4]  Qiang Liu,et al.  DARE: Dynamic Adaptive Mobile Augmented Reality with Edge Computing , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).

[5]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[7]  Munchurl Kim,et al.  Moving Object Tracking in H.264/AVC Bitstream , 2007, MCAM.

[8]  Munchurl Kim,et al.  Moving Object Detection and Tracking Using a Spatio-Temporal Graph in H.264/AVC Bitstreams for Video Surveillance , 2012, IEEE Transactions on Multimedia.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[10]  Mongi A. Abidi,et al.  Optical flow-based real-time object tracking using non-prior training active feature model , 2005, Real Time Imaging.

[11]  Shao-Yi Chien,et al.  Fast image segmentation based on K-Means clustering with histograms in HSV color space , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[12]  Zhenming Liu,et al.  DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Randy H. Katz,et al.  MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency , 2018, SenSys.

[15]  Yufei Wang,et al.  Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics , 2020, SIGCOMM.

[16]  Kittipat Apicharttrisorn,et al.  Frugal following: power thrifty object detection and tracking for mobile augmented reality , 2019, SenSys.

[17]  Conrad Sanderson,et al.  Armadillo: a template-based C++ library for linear algebra , 2016, J. Open Source Softw..

[18]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[19]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[20]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[22]  Bahar Asgari,et al.  Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).

[23]  Thrasyvoulos N. Pappas An adaptive clustering algorithm for image segmentation , 1992, IEEE Trans. Signal Process..

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[29]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[30]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[33]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[34]  Matti Siekkinen,et al.  Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.

[35]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Thrasyvoulos N. Pappas,et al.  An Adaptive Clustering Algorithm For Image Segmentation , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[37]  Menglong Zhu,et al.  Mobile Video Object Detection with Temporally-Aware Feature Maps , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Hyeontaek Lim,et al.  Scaling Video Analytics on Constrained Edge Nodes , 2019, MLSys.

[39]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[41]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yujie Wang,et al.  Flow-Guided Feature Aggregation for Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[44]  Sanjivani Shantaiya,et al.  Multiple Object Tracking using Kalman Filter and Optical Flow , 2015 .

[45]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[47]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[48]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.