Tracking 3-D Motion of Dynamic Objects Using Monocular Visual-Inertial Sensing

Six degree-of-freedom (6-DoF) visual tracking of dynamic objects is fundamental to a large variety of robotics and augmented reality (AR) applications. A key to this problem is accurate distance measurement of dynamic objects, which is usually obtained via stereo cameras, RGB-D sensors, or LiDARs. In this paper, however, we address the problem using only a monocular camera rigidly mounted with a low-cost inertial measurement unit. This is a light-weight, small-size, and low-cost solution, which is particularly suitable for tracking dynamic objects on drones or on mobile phones. Starting from a generic image-based two-dimensional tracker, we propose a novel method to resolve the object scale ambiguity in monocular vision in a geometric manner based on correlation analysis. This enables accurate metric three-dimensional tracking of arbitrary objects without requiring any prior knowledge about the object shape or size. We discuss the applicability by analyzing the observability condition and degenerated cases for object scale recovery. Simulation and real-world experimental results with ground truth comparison, along with AR application examples, demonstrate the feasibility of the proposed 6-DoF tracking method.

[1]  Olivier Aycard,et al.  Detection, classification and tracking of moving objects in a 3D environment , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[2]  Shaojie Shen,et al.  Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Yi Lin,et al.  Autonomous aerial navigation using monocular visual‐inertial fusion , 2018, J. Field Robotics.

[4]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Francisco José Madrid-Cuevas,et al.  Automatic generation and detection of highly reliable fiducial markers under occlusion , 2014, Pattern Recognit..

[6]  Roman P. Pflugfelder,et al.  Clustering of static-adaptive correspondences for deformable object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  J. Hooper,et al.  Simultaneous Equations and Canonical Correlation Theory , 1959 .

[8]  Amnon Shashua,et al.  Trajectory Triangulation: 3D Reconstruction of Moving Points from a Monocular Image Sequence , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Shaojie Shen,et al.  Monocular Visual–Inertial State Estimation With Online Initialization and Camera–IMU Extrinsic Calibration , 2017, IEEE Transactions on Automation Science and Engineering.

[10]  Ali Farhadi,et al.  Re$^3$: Re al-Time Recurrent Regression Networks for Visual Tracking of Generic Objects , 2017, IEEE Robotics and Automation Letters.

[11]  Robert E. Mahony,et al.  Simultaneous Localization and Mapping with Dynamic Rigid Objects , 2018, ArXiv.

[12]  Hugh F. Durrant-Whyte,et al.  Simultaneous Localization, Mapping and Moving Object Tracking , 2007, Int. J. Robotics Res..

[13]  Danping Zou,et al.  CoSLAM: Collaborative Visual SLAM in Dynamic Environments , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Shaojie Shen,et al.  Model-aided monocular visual-inertial state estimation and dense mapping , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Markus Vincze,et al.  Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[16]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[17]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ming-Shyan Wang,et al.  3D object pose estimation using stereo vision for object manipulation system , 2017, 2017 International Conference on Applied System Innovation (ICASI).

[19]  Shaojie Shen,et al.  Model-Based Global Localization for Aerial Robots Using Edge Alignment , 2017, IEEE Robotics and Automation Letters.

[20]  Qifeng Yu,et al.  Monocular trajectory intersection method for 3D motion measurement of a point target , 2009 .

[21]  Yaser Sheikh,et al.  3D Reconstruction of a Moving Point from a Series of 2D Projections , 2010, ECCV.

[22]  Yaser Sheikh,et al.  Spatiotemporal Bundle Adjustment for Dynamic 3D Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[24]  Sebastian Thrun,et al.  Model based vehicle detection and tracking for autonomous urban driving , 2009, Auton. Robots.

[25]  Pascal Fua,et al.  Flight Dynamics-Based Recovery of a UAV Trajectory Using Ground Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Mina Teicher,et al.  A General Framework for Trajectory Triangulation , 2004, Journal of Mathematical Imaging and Vision.

[29]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[30]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[31]  Luc Van Gool,et al.  Multibody Structure-from-Motion in Practice , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  B. Thompson Canonical Correlation Analysis , 1984 .

[33]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[34]  Henrik I. Christensen,et al.  3D textureless object detection and tracking: An edge-based approach , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Fei Gao,et al.  Real-time monocular dense mapping on aerial robots using visual-inertial fusion , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[37]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Shaojie Shen,et al.  Stereo Vision-based Semantic 3D Object and Ego-motion Tracking for Autonomous Driving , 2018, ECCV.

[39]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).