Toward Mobile 3D Vision

In the past few years, the computer vision community has developed numerous novel technologies of 3D vision (e.g., 3D object detection and classification and 3D scene segmentation). In this work, we explore the opportunities brought by these innovations for enabling real-time 3D vision on mobile devices. Mobile 3D vision finds various use cases for emerging applications such as autonomous driving, drone navigation, and augmented reality (AR). The key differences between 3D vision and 2D vision mainly stem from the input data format (i.e., point clouds or 3D meshes vs. 2D images). Hence, the key challenge of 3D vision is that it is could be more computation intensive and memory hungry than 2D vision, due to the additional dimension of input data. For example, our preliminary measurement study of several state-of-the-art machine learning models for 3D vision shows that none of them can execute faster than one frame per second on smartphones. Motivated by these challenges, we present in this position paper a research agenda on offering systems support for real-time mobile 3D vision, focusing on improving its computation efficiency and memory utilization.

[1]  GovindanRamesh,et al.  Augmented Vehicular Reality , 2019 .

[2]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[3]  Daniel Cohen-Or,et al.  MeshCNN: a network with an edge , 2019, ACM Trans. Graph..

[4]  Ramesh Govindan,et al.  CarMap: Fast 3D Feature Map Updates for Automobiles , 2020, NSDI.

[5]  Leonidas J. Guibas,et al.  PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hui Liu,et al.  On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework , 2018, MobiSys.

[7]  Richard P. Martin,et al.  BigRoad: Scaling Road Data Acquisition for Dependable Self-Driving , 2017, MobiSys.

[8]  Helge J. Ritter,et al.  Realtime 3D segmentation for human-robot interaction , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Paramvir Bahl,et al.  Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices , 2015, SenSys.

[10]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Diana Marculescu,et al.  AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling , 2019, MLSys.

[13]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Nicholas D. Lane,et al.  MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors , 2019, MobiCom.

[15]  Xuanzhe Liu,et al.  DeepCache: Principled Cache for Mobile Deep Vision , 2017, MobiCom.

[16]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[19]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tian Zheng,et al.  Live Semantic 3D Perception for Immersive Augmented Reality , 2020, IEEE Transactions on Visualization and Computer Graphics.

[21]  William J. Dally,et al.  CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video , 2019, MLSys.

[22]  Paramvir Bahl,et al.  Energy characterization and optimization of image sensing toward continuous mobile vision , 2013, MobiSys '13.

[23]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[25]  Ricardo Carelli,et al.  A trajectory tracking and 3D positioning controller for the AR.Drone quadrotor , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[26]  Mahadev Satyanarayanan,et al.  Towards wearable cognitive assistance , 2014, MobiSys.

[27]  C. Qi,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Justin Manweiler,et al.  OverLay: Practical Mobile Augmented Reality , 2015, MobiSys.

[29]  Hideo Saito,et al.  Semantic Segmentation of 3D Point Cloud to Virtually Manipulate Real Living Space , 2019, 2019 12th Asia Pacific Workshop on Mixed and Augmented Reality (APMAR).

[30]  Seungchul Lee,et al.  My Being to Your Place, Your Being to My Place: Co-present Robotic Avatars Create Illusion of Living Together , 2018, MobiSys.

[31]  Wei Gao,et al.  Minimizing Context Migration in Mobile Code Offload , 2017, IEEE Transactions on Mobile Computing.

[32]  Justin Manweiler,et al.  Low Bandwidth Offload for Mobile AR , 2016, CoNEXT.

[33]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[34]  Horst-Michael Groß,et al.  Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds , 2018, ECCV Workshops.

[35]  Lin Zhong,et al.  Starfish: Efficient Concurrency Support for Computer Vision Applications , 2015, MobiSys.

[36]  Nicholas D. Lane,et al.  Poster: MobiSR -- Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors , 2019, MobiCom.

[37]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.