An Automatic Key-Frame Selection Method for Monocular Visual Odometry of Ground Vehicle

In response to the rapid change of attitude in complex situations such as sharp turns and camera jitter, visual odometry/visual simultaneous localization and mapping (VO/VSLAM) often requires a high camera frame rate. However, the high frame rate of the camera poses a great challenge to real-time VO/VSLAM, especially when the vehicle travels along a relatively flat trajectory which requires higher real-time performance. Therefore, in this paper, we propose an automatic method for key-frame selection based on the motion state of the vehicle, which can reduce data redundancy and improve the real-time performance and robustness of VO/VSLAM. First, a pyramid-layered Kanade–Lucas–Tomasi (KLT) algorithm is used to track the feature points, and the five-point method and RANSAC are used to calculate the essential matrix. Second, two-step decomposition is used to calculate the change of inter-frame attitude, and then, key-frames are automatically selected based on the motion state of the vehicle predicted by attitude change within a short time interval. To evaluate the method, we conduct extensive experiments on real data and Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset based on monocular visual odometry. Then, we made qualitative and quantitative evaluations from the comparison with the reference trajectory, relative error (RE), root mean square error (RMSE), and absolute trajectory error (ATE). The results indicate that our method can improve the real-time performance while ensuring accuracy, with the data redundancy reduced by about 40%–60%. In addition, the performance of our method is further verified by comparison with the current representative ORB-SLAM.

[1]  Fernando Díaz-de-María,et al.  Temporal segmentation and keyframe selection methods for user-generated video search-based annotation , 2015, Expert Syst. Appl..

[2]  Juan Song,et al.  Semantic SLAM Based on Object Detection and Improved Octomap , 2018, IEEE Access.

[3]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[4]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[5]  Truong Q. Nguyen,et al.  An integrated stereo visual odometry for robotic navigation , 2014, Robotics Auton. Syst..

[6]  A. Bab-Hadiashar,et al.  An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics , 2015 .

[7]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[8]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Hauke Strasdat,et al.  Visual SLAM: Why filter? , 2012, Image Vis. Comput..

[10]  Friedrich Fraundorfer,et al.  Visual Odometry Part I: The First 30 Years and Fundamentals , 2022 .

[11]  John Devlin,et al.  Analysis of real-time velocity compensation for outdoor optical mouse sensor odometry , 2010, 2010 11th International Conference on Control Automation Robotics & Vision.

[12]  David Nistér,et al.  Preemptive RANSAC for live structure and motion estimation , 2005, Machine Vision and Applications.

[13]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[14]  Marcello R. Napolitano,et al.  A Survey of Optical Flow Techniques for Robotics Navigation Applications , 2014, J. Intell. Robotic Syst..

[15]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[16]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[17]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[18]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[19]  Daniel Cremers,et al.  Dense visual SLAM for RGB-D cameras , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Elie A. Shammas,et al.  Keyframe-based monocular SLAM: design, survey, and future directions , 2016, Robotics Auton. Syst..

[21]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[22]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Eren Allak,et al.  Key-Frame Strategy During Fast Image-Scale Changes and Zero Motion in VIO Without Persistent Features , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Hujun Bao,et al.  Robust monocular SLAM in dynamic environments , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[27]  Hujun Bao,et al.  Robust Keyframe-based Dense SLAM with an RGB-D Camera , 2017, ArXiv.

[28]  Andrew Howard,et al.  Real-time stereo visual odometry for autonomous ground vehicles , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Shaojie Shen,et al.  VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator , 2017, IEEE Transactions on Robotics.

[30]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[31]  Michael J. Black,et al.  On the Spatial Statistics of Optical Flow , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[32]  Marcello R. Napolitano,et al.  A Comparison of Optical Flow algorithms for Real Time Aircraft Guidance and Navigation , 2008 .

[33]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[34]  Yi Lin,et al.  Autonomous aerial navigation using monocular visual‐inertial fusion , 2018, J. Field Robotics.

[35]  James R. Bergen,et al.  Visual odometry for ground vehicle applications , 2006, J. Field Robotics.

[36]  Daniel Cremers,et al.  Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.