xyzNet: Towards Machine Learning Camera Relocalization by Using a Scene Coordinate Prediction Network

Camera relocalization is a common problem in several applications such as augmented reality or robot navigation. Especially, augmented reality requires fast, accurate and robust camera localization. However, it is still challenging to have a both real-time and accurate method. In this paper, we present our hybrid method combing machine learning approach and geometric approach for real-time camera relocalization from a single RGB image. We propose a light Convolutional Neural Network (CNN) called xyzNet to efficiently and robustly regress 3D world coordinates of key-points in an image. Then, the geometric information about 2D-3D correspondences allows the removal of ambiguous predictions and the calculation of more accurate camera pose. Moreover, we show favorable results compared to previous machine learning based approaches about the accuracy and the performance of our method on different datasets as well as the capacity to address challenges concerning dynamic scene.

[1]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[6]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ben Glocker,et al.  Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[8]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[11]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[12]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[13]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[16]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Didier Stricker,et al.  CoRBS: Comprehensive RGB-D benchmark for SLAM using Kinect v2 , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[20]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Nassir Navab,et al.  6D Object Pose Estimation with Depth Images: A Seamless Approach for Robotic Interaction and Augmented Reality , 2017, ArXiv.

[25]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[27]  Jérôme Royan,et al.  [POSTER] Decision Forest For Efficient and Robust Camera Relocalization , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct).

[28]  Jean-François Lalonde,et al.  Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[29]  Theerayod Wiangtong,et al.  Pose estimation of unmanned ground vehicle based on dead-reckoning/GPS sensor fusion by unscented Kalman filter , 2009, 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.