Real-Time RGB-D Camera Pose Estimation in Novel Scenes Using a Relocalisation Cascade

Camera pose estimation is an important problem in computer vision, with applications as diverse as simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques match the current image against keyframes with known poses coming from a tracker, directly regress the pose, or establish correspondences between keypoints in the current image and points in the scene in order to estimate the pose. In recent years, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but have traditionally needed to be trained offline on the target scene, preventing relocalisation in new environments. Recently, we showed how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. The adapted forests achieved relocalisation performance that was on par with that of offline forests, and our approach was able to estimate the camera pose in close to real time, which made it desirable for systems that require online relocalisation. In this paper, we present an extension of this work that achieves significantly better relocalisation performance whilst running fully in real time. To achieve this, we make several changes to the original approach: (i) instead of simply accepting the camera pose hypothesis produced by RANSAC without question, we make it possible to score the final few hypotheses it considers using a geometric approach and select the most promising one; (ii) we chain several instantiations of our relocaliser (with different parameter settings) together in a cascade, allowing us to try faster but less accurate relocalisation first, only falling back to slower, more accurate relocalisation as necessary; and (iii) we tune the parameters of our cascade, and the individual relocalisers it contains, to achieve effective overall performance. Taken together, these changes allow us to significantly improve upon the performance our original state-of-the-art method was able to achieve on the well-known 7-Scenes and Stanford 4 Scenes benchmarks. As additional contributions, we present a novel way of visualising the internal behaviour of our forests, and use the insights gleaned from this to show how to entirely circumvent the need to pre-train a forest on a generic scene.

[1]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Stefano Soatto,et al.  Really Quick Shift: Image Segmentation on a GPU , 2010, ECCV Workshops.

[3]  Galvez-LópezDorian,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012 .

[4]  Lei Deng,et al.  Incremental image set querying based localization , 2016, Neurocomputing.

[5]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[6]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[7]  Torsten Sattler,et al.  Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Juho Kannala,et al.  Camera Relocalization by Computing Pairwise Relative Poses Using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[9]  Andrew W. Fitzgibbon,et al.  Large-scale and drift-free surface reconstruction using online subvolume registration , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  SunXin,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015 .

[11]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[12]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[13]  Dieter Fox,et al.  Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[14]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Esa Rahtu,et al.  Image-Based Localization Using Hourglass Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[17]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[19]  Andrew W. Fitzgibbon,et al.  Multi-output Learning for Camera Relocalization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[22]  Dorian Gálvez-López,et al.  Real-time loop detection with bags of binary words , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Matthew Turk,et al.  Location-based augmented reality on mobile phones , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[24]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[25]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Ian D. Reid,et al.  Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Torsten Sattler,et al.  InLoc: Indoor Visual Localization with Dense Matching and View Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Matthias Nießner,et al.  SemanticPaint , 2015, ACM Trans. Graph..

[29]  Juho Kannala,et al.  Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization , 2018, ECCV Workshops.

[30]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[31]  Nicu Sebe,et al.  Localize Me Anywhere, Anytime: A Multi-task Point-Retrieval Approach , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Wolfram Burgard,et al.  Deep Auxiliary Learning for Visual Localization and Odometry , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[35]  Daniel Cremers,et al.  Image-Based Localization Using LSTMs for Structured Feature Correlation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Ben Glocker,et al.  Real-Time RGB-D Camera Relocalization via Randomized Ferns for Keyframe Encoding , 2015, IEEE Transactions on Visualization and Computer Graphics.

[37]  Jérôme Royan,et al.  [POSTER] Decision Forest For Efficient and Robust Camera Relocalization , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct).

[38]  Pushmeet Kohli,et al.  The Global Patch Collider , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Gérard G. Medioni,et al.  RGB-D camera based wearable navigation system for the visually impaired , 2016, Comput. Vis. Image Underst..

[40]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[41]  Philip H. S. Torr,et al.  Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[42]  James J. Little,et al.  Backtracking regression forests for accurate camera relocalization , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Olaf Kähler,et al.  Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure , 2016, ECCV.

[44]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[45]  Walterio W. Mayol-Cuevas,et al.  6D Relocalisation for RGBD Cameras Using Synthetic View Regression , 2012, BMVC.

[46]  Valérie Gouet-Brunet,et al.  A survey on Visual-Based Localization: On the benefit of heterogeneous data , 2018, Pattern Recognit..

[47]  Olaf Kähler,et al.  InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure , 2017, ArXiv.

[48]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[49]  Torsten Sattler,et al.  Semantic Visual Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  James J. Little,et al.  Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Matthias Nießner,et al.  Learning to Navigate the Energy Landscape , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[52]  Sterling Wu,et al.  E-learning: less is more , 2017 .

[53]  Mani Golparvar-Fard,et al.  Fast and scalable structure-from-motion based localization for high-precision mobile augmented reality systems , 2016, mUX: The Journal of Mobile User Experience.

[54]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[55]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Wolfram Burgard,et al.  VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry , 2018, IEEE Robotics and Automation Letters.

[58]  Eric Brachmann,et al.  Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[59]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[60]  Nicolas Padoy,et al.  Marker-Less AR in the Hybrid Room Using Equipment Detection for Camera Relocalization , 2015, MICCAI.

[61]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[62]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[63]  James J. Little,et al.  Exploiting Random RGB and Sparse Features for Camera Pose Estimation , 2016, BMVC.

[64]  Juho Kannala,et al.  Full-Frame Scene Coordinate Regression for Image-Based Localization , 2018, Robotics: Science and Systems.

[65]  Andrew Calway,et al.  RGBD relocalisation using pairwise geometry and concise key point sets , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[66]  Luigi di Stefano,et al.  On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[69]  Juan D. Tardós,et al.  Fast relocalisation and loop closing in keyframe-based SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[70]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).