Realtime 3D Object Detection for Headsets

Mobile headsets should be capable of understanding 3D physical environments to offer a truly immersive experience for augmented/mixed reality (AR/MR). However, their small form-factor and limited computation resources make it extremely challenging to execute in real-time 3D vision algorithms, which are known to be more compute-intensive than their 2D counterparts. In this paper, we propose DeepMix, a mobility-aware, lightweight, and hybrid 3D object detection framework for improving the user experience of AR/MR on mobile headsets. Motivated by our analysis and evaluation of state-of-the-art 3D object detection models, DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations that leverage depth data captured by headsets. This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios. A unique feature of DeepMix is that it fully exploits the mobility of headsets to fine-tune detection results and boost detection accuracy. To the best of our knowledge, DeepMix is the first 3D object detection that achieves 30 FPS (i.e., an end-to-end latency much lower than the 100 ms stringent requirement of interactive AR/MR). We implement a prototype of DeepMix on Microsoft HoloLens and evaluate its performance via both extensive controlled experiments and a user study with 30+ participants. DeepMix not only improves detection accuracy by 9.1–37.3% but also reduces end-to-end latency by 2.68– 9.15×, compared to the baseline that uses existing 3D object detection models.

[1]  M. Bergamasco,et al.  A new screw theory method for the estimation of position accuracy in spatial parallel manipulators with revolute joint clearances , 2011 .

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Bo Han,et al.  Jaguar: Low Latency Mobile Augmented Reality with Flexible Tracking , 2018, ACM Multimedia.

[5]  Jiwen Lu,et al.  Deep Fitting Degree Scoring Network for Monocular 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  C. Y. Ip,et al.  Slimmer: Accelerating 3D Semantic Segmentation for Mobile Augmented Reality , 2020, 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[9]  Dushyant Rao,et al.  Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Jianxiong Xiao,et al.  Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Bernard Ghanem,et al.  2D-Driven 3D Object Detection in RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Kittipat Apicharttrisorn,et al.  Frugal following: power thrifty object detection and tracking for mobile augmented reality , 2019, SenSys.

[13]  Xiaogang Wang,et al.  PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jun Wang,et al.  MLCVNet: Multi-Level Context VoteNet for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time dynamic 3D surface reconstruction and interaction , 2011, SIGGRAPH '11.

[17]  Gerd Hirzinger,et al.  More accurate pinhole camera calibration with imperfect planar target , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[18]  Zhiwu Lu,et al.  Learning Depth-Guided Convolutions for Monocular 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Sanja Fidler,et al.  3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Keiichi Matsuda WHEN MIXED REALITY MEETS INTERNET OF THINGS : Toward the Realization of Ubiquitous Mixed Reality , 2018 .

[21]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Leonidas J. Guibas,et al.  3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[25]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Erik B. Sudderth,et al.  Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Randy H. Katz,et al.  MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency , 2018, SenSys.

[28]  Haojie Li,et al.  Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[31]  Steven Lake Waslander,et al.  Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Zhen Li,et al.  HoloDoc: Enabling Mixed Reality Workspaces that Harness Physical and Digital Content , 2019, CHI.

[33]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[34]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[35]  Ji Wan,et al.  Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Leonidas J. Guibas,et al.  ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  D. Roetenberg,et al.  Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors , 2009 .

[38]  Yijun Ji,et al.  Fusing Depth and Silhouette for Scanning Transparent Object with RGB-D Sensor , 2017 .

[39]  Youngki Lee,et al.  Heimdall: mobile GPU coordination platform for augmented reality applications , 2020, MobiCom.

[40]  Jens Grubert,et al.  A Survey of Calibration Methods for Optical See-Through Head-Mounted Displays , 2017, IEEE Transactions on Visualization and Computer Graphics.

[41]  Seungchul Lee,et al.  My Being to Your Place, Your Being to My Place: Co-present Robotic Avatars Create Illusion of Living Together , 2018, MobiSys.

[42]  Ralph R. Martin,et al.  Faithful Least-Squares Fitting of Spheres, Cylinders, Cones and Tori for Reliable Segmentation , 1998, ECCV.

[43]  Sanja Fidler,et al.  Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yu Feng,et al.  Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[45]  JeongGil Ko,et al.  LpGL: Low-power Graphics Library for Mobile AR Headsets , 2019, MobiSys.

[46]  Shuran Song,et al.  Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Prasant Mohapatra,et al.  Toward Mobile 3D Vision , 2020, 2020 29th International Conference on Computer Communications and Networks (ICCCN).

[48]  Yu Fang,et al.  Eye-Head Coordination for Visual Cognitive Processing , 2015, PloS one.

[49]  Gim Hee Lee,et al.  Transferable Semi-Supervised 3D Object Detection From RGB-D Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Marco Gruteser,et al.  Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.