论文信息 - Realtime 3D Object Detection for Headsets

Realtime 3D Object Detection for Headsets

Mobile headsets should be capable of understanding 3D physical environments to offer a truly immersive experience for augmented/mixed reality (AR/MR). However, their small form-factor and limited computation resources make it extremely challenging to execute in real-time 3D vision algorithms, which are known to be more compute-intensive than their 2D counterparts. In this paper, we propose DeepMix, a mobility-aware, lightweight, and hybrid 3D object detection framework for improving the user experience of AR/MR on mobile headsets. Motivated by our analysis and evaluation of state-of-the-art 3D object detection models, DeepMix intelligently combines edge-assisted 2D object detection and novel, on-device 3D bounding box estimations that leverage depth data captured by headsets. This leads to low end-to-end latency and significantly boosts detection accuracy in mobile scenarios. A unique feature of DeepMix is that it fully exploits the mobility of headsets to fine-tune detection results and boost detection accuracy. To the best of our knowledge, DeepMix is the first 3D object detection that achieves 30 FPS (i.e., an end-to-end latency much lower than the 100 ms stringent requirement of interactive AR/MR). We implement a prototype of DeepMix on Microsoft HoloLens and evaluate its performance via both extensive controlled experiments and a user study with 30+ participants. DeepMix not only improves detection accuracy by 9.1–37.3% but also reduces end-to-end latency by 2.68– 9.15×, compared to the baseline that uses existing 3D object detection models.

[1] M. Bergamasco,et al. A new screw theory method for the estimation of position accuracy in spatial parallel manipulators with revolute joint clearances , 2011 .

[2] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Bo Han,et al. Jaguar: Low Latency Mobile Augmented Reality with Flexible Tracking , 2018, ACM Multimedia.

[5] Jiwen Lu,et al. Deep Fitting Degree Scoring Network for Monocular 3D Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Xiaogang Wang,et al. PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Leonidas J. Guibas,et al. Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] C. Y. Ip,et al. Slimmer: Accelerating 3D Semantic Segmentation for Mobile Augmented Reality , 2020, 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[9] Dushyant Rao,et al. Vote3Deep: Fast object detection in 3D point clouds using efficient convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Bernard Ghanem,et al. 2D-Driven 3D Object Detection in RGB-D Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Kittipat Apicharttrisorn,et al. Frugal following: power thrifty object detection and tracking for mobile augmented reality , 2019, SenSys.

[13] Xiaogang Wang,et al. PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jun Wang,et al. MLCVNet: Multi-Level Context VoteNet for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jiong Yang,et al. PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Andrew W. Fitzgibbon,et al. KinectFusion: real-time dynamic 3D surface reconstruction and interaction , 2011, SIGGRAPH '11.

[17] Gerd Hirzinger,et al. More accurate pinhole camera calibration with imperfect planar target , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[18] Zhiwu Lu,et al. Learning Depth-Guided Convolutions for Monocular 3D Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19] Sanja Fidler,et al. 3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Keiichi Matsuda. WHEN MIXED REALITY MEETS INTERNET OF THINGS : Toward the Realization of Ubiquitous Mixed Reality , 2018 .

[21] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Leonidas J. Guibas,et al. 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[24] 장윤희,et al. Y. , 2003, Industrial and Labor Relations Terms.

[25] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Erik B. Sudderth,et al. Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Randy H. Katz,et al. MARVEL: Enabling Mobile Augmented Reality with Low Energy and Low Latency , 2018, SenSys.

[28] Haojie Li,et al. Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Yin Zhou,et al. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Hong-Yuan Mark Liao,et al. YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[31] Steven Lake Waslander,et al. Joint 3D Proposal Generation and Object Detection from View Aggregation , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32] Zhen Li,et al. HoloDoc: Enabling Mixed Reality Workspaces that Harness Physical and Digital Content , 2019, CHI.

[33] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[34] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[35] Ji Wan,et al. Multi-view 3D Object Detection Network for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Leonidas J. Guibas,et al. ImVoteNet: Boosting 3D Object Detection in Point Clouds With Image Votes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] D. Roetenberg,et al. Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors , 2009 .

[38] Yijun Ji,et al. Fusing Depth and Silhouette for Scanning Transparent Object with RGB-D Sensor , 2017 .

[39] Youngki Lee,et al. Heimdall: mobile GPU coordination platform for augmented reality applications , 2020, MobiCom.

[40] Jens Grubert,et al. A Survey of Calibration Methods for Optical See-Through Head-Mounted Displays , 2017, IEEE Transactions on Visualization and Computer Graphics.

[41] Seungchul Lee,et al. My Being to Your Place, Your Being to My Place: Co-present Robotic Avatars Create Illusion of Living Together , 2018, MobiSys.

[42] Ralph R. Martin,et al. Faithful Least-Squares Fitting of Spheres, Cylinders, Cones and Tori for Reliable Segmentation , 1998, ECCV.

[43] Sanja Fidler,et al. Monocular 3D Object Detection for Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Yu Feng,et al. Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[45] JeongGil Ko,et al. LpGL: Low-power Graphics Library for Mobile AR Headsets , 2019, MobiSys.

[46] Shuran Song,et al. Clear Grasp: 3D Shape Estimation of Transparent Objects for Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[47] Prasant Mohapatra,et al. Toward Mobile 3D Vision , 2020, 2020 29th International Conference on Computer Communications and Networks (ICCCN).

[48] Yu Fang,et al. Eye-Head Coordination for Visual Cognitive Processing , 2015, PloS one.

[49] Gim Hee Lee,et al. Transferable Semi-Supervised 3D Object Detection From RGB-D Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50] Marco Gruteser,et al. Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.