Frugal following: power thrifty object detection and tracking for mobile augmented reality

Accurate tracking of objects in the real world is highly desirable in Augmented Reality (AR) to aid proper placement of virtual objects in a user's view. Deep neural networks (DNNs) yield high precision in detecting and tracking objects, but they are energy-heavy and can thus be prohibitive for deployment on mobile devices. Towards reducing energy drain while maintaining good object tracking precision, we develop a novel software framework called MARLIN. MARLIN only uses a DNN as needed, to detect new objects or recapture objects that significantly change in appearance. It employs lightweight methods in between DNN executions to track the detected objects with high fidelity. We experiment with several baseline DNN models optimized for mobile devices, and via both offline and live object tracking experiments on two different Android phones (one utilizing a mobile GPU), we show that MARLIN compares favorably in terms of accuracy while saving energy significantly. Specifically, we show that MARLIN reduces the energy consumption by up to 73.3% (compared to an approach that executes the best baseline DNN continuously), and improves accuracy by up to 19× (compared to an approach that infrequently executes the same best baseline DNN). Moreover, while in 75% or more cases, MARLIN incurs at most a 7.36% reduction in location accuracy (using the common IOU metric), in more than 46% of the cases, MARLIN even improves the IOU compared to the continuous, best DNN approach.

[1]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[2]  Ivan Lin,et al.  ARM platform for performance and power efficiency — Hardware and software perspectives , 2016, 2016 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[5]  Li Dan,et al.  Moving object tracking method based on improved lucas-kanade sparse optical flow algorithm , 2017, 2017 International Smart Cities Conference (ISC2).

[6]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[7]  Paramvir Bahl,et al.  Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices , 2015, SenSys.

[8]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Jie Liu,et al.  Glimpse: A Programmable Early-Discard Camera Architecture for Continuous Mobile Vision , 2017, MobiSys.

[11]  Rajesh Krishna Balan,et al.  DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications , 2017, MobiSys.

[12]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Yawen Fan,et al.  Object tracking based on ORB and temporal-spacial constraint , 2012, 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI).

[16]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[17]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Katsushi Ikeuchi,et al.  Illumination normalization with time-dependent intrinsic images for video surveillance , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[19]  Li Shuangfeng,et al.  TensorFlow Lite: On-Device Machine Learning Framework , 2020 .

[20]  Bo Han,et al.  Jaguar: Low Latency Mobile Augmented Reality with Flexible Tracking , 2018, ACM Multimedia.

[21]  Jian Cheng,et al.  Pedestrian Detection Based on HOG-LBP Feature , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[22]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[23]  Dieter Schmalstieg,et al.  Real-Time Detection and Tracking for Augmented Reality on Mobile Phones , 2010, IEEE Transactions on Visualization and Computer Graphics.

[24]  Xiaogang Wang,et al.  Object Detection from Video Tubelets with Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Randy H. Katz,et al.  MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency , 2018, SenSys.

[26]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[27]  Niranjan Balasubramanian,et al.  MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU , 2017, EMDL '17.

[28]  Justin Manweiler,et al.  Low Bandwidth Offload for Mobile AR , 2016, CoNEXT.

[29]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[30]  Zhuo Yang Fast Template Matching Based on Normalized Cross Correlation with Centroid Bounding , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[31]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Paramvir Bahl,et al.  Energy characterization and optimization of image sensing toward continuous mobile vision , 2013, MobiSys '13.

[33]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[34]  Gang Song,et al.  Object Detection Combining Recognition and Segmentation , 2007, ACCV.

[35]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[36]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37]  Pan Hui,et al.  Mobile Augmented Reality Survey: From Where We Are to Where We Go , 2017, IEEE Access.

[38]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[39]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[40]  Xin Wang,et al.  IDK Cascades: Fast Deep Learning by Learning not to Overthink , 2017, UAI.

[41]  Zhenming Liu,et al.  DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[42]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Badrinath Roysam,et al.  Image change detection algorithms: a systematic survey , 2005, IEEE Transactions on Image Processing.

[44]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[45]  Yichen Wei,et al.  Deep Feature Flow for Video Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Tobias Höllerer,et al.  Augmented reality: principles and practice , 2016, SIGGRAPH Courses.

[47]  Feng Qian,et al.  CARS: Collaborative Augmented Reality for Socialization , 2018, HotMobile.

[48]  Mahadev Satyanarayanan,et al.  Towards wearable cognitive assistance , 2014, MobiSys.

[49]  Justin Manweiler,et al.  OverLay: Practical Mobile Augmented Reality , 2015, MobiSys.

[50]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.