Real-Time Visual–Inertial SLAM Based on Adaptive Keyframe Selection for Mobile AR Applications

Simultaneous localization and mapping (SLAM) technology is used in many applications, such as augmented reality (AR)/virtual reality, robots, drones, and self-driving vehicles. In AR applications, rapid camera motion estimation, actual size, and scale are important issues. In this research, we introduce a real-time visual–inertial SLAM based on an adaptive keyframe selection for mobile AR applications. Specifically, the SLAM system is designed based on the adaptive keyframe selection visual–inertial odometry method that includes the adaptive keyframe selection method and the lightweight visual–inertial odometry method. The inertial measurement unit data are used to predict the motion state of the current frame and it is judged whether or not the current frame is a keyframe by an adaptive selection method based on learning and automatic setting. Relatively unimportant frames (not a keyframe) are processed using a lightweight visual–inertial odometry method for efficiency and real-time performance. We simulate it in a PC environment and compare it with state-of-the-art methods. The experimental results demonstrate that the mean translation root-mean-square error of the keyframe trajectory is 0.067 m without the ground-truth scale matching, and the scale error is 0.58% with the EuRoC dataset. Moreover, the experimental results of the mobile device show that the performance is improved by 34.5%–53.8% using the proposed method.

[1]  Jörg Stückler,et al.  Keyframe-based visual-inertial online SLAM with relocalization , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[3]  Roland Siegwart,et al.  Versatile distributed pose estimation and sensor self-calibration for an autonomous MAV , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[5]  Frank Dellaert,et al.  On-Manifold Preintegration for Real-Time Visual--Inertial Odometry , 2015, IEEE Transactions on Robotics.

[6]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[7]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[9]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Roland Siegwart,et al.  The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[11]  Shaojie Shen,et al.  Monocular Visual–Inertial State Estimation With Online Initialization and Camera–IMU Extrinsic Calibration , 2017, IEEE Transactions on Automation Science and Engineering.

[12]  John J. Leonard,et al.  Monocular SLAM Supported Object Recognition , 2015, Robotics: Science and Systems.

[13]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[14]  Jie Li,et al.  Fast and Adaptive 3D Reconstruction With Extensively High Completeness , 2017, IEEE Transactions on Multimedia.

[15]  Roland Siegwart,et al.  Real-time onboard visual-inertial state estimation and self-calibration of MAVs in unknown environments , 2012, 2012 IEEE International Conference on Robotics and Automation.

[16]  Roland Siegwart,et al.  Unified temporal and spatial calibration for multi-sensor systems , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Mingyang Li,et al.  Visual-Inertial Odometry on Resource-Constrained Systems , 2014 .

[19]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[20]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[21]  T. Chai,et al.  Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .

[22]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[23]  Shin-Dug Kim,et al.  Adaptive Monocular Visual–Inertial SLAM for Real-Time Augmented Reality Applications in Mobile Devices , 2017, Sensors.

[24]  Patrick Rives,et al.  Visual servoing based on epipolar geometry , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[25]  Javier Ruiz Hidalgo,et al.  Real-Time Head and Hand Tracking Based on 2.5D Data , 2012 .