6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference

We present a multimodal camera relocalization framework that captures ambiguities and uncertainties with continuous mixture models defined on the manifold of camera poses. In highly ambiguous environments, which can easily arise due to symmetries and repetitive structures in the scene, computing one plausible solution (what most state-of-the-art methods currently regress) may not be sufficient. Instead we predict multiple camera pose hypotheses as well as the respective uncertainty for each prediction. Towards this aim, we use Bingham distributions, to model the orientation of the camera pose, and a multivariate Gaussian to model the position, with an end-to-end deep neural network. By incorporating a Winner-Takes-All training scheme, we finally obtain a mixture model that is well suited for explaining ambiguities in the scene, yet does not suffer from mode collapse, a common problem with mixture density networks. We introduce a new dataset specifically designed to foster camera localization research in ambiguous environments and exhaustively evaluate our method on synthetic as well as real data on both ambiguous scenes and on non-ambiguous benchmark datasets. We plan to release our code and dataset under $\href{this https URL}{this http URL}$.

[1]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Audrey Giremus,et al.  Continuous-Discrete Extended Kalman Filter on Matrix Lie Groups Using Concentrated Gaussian Distributions , 2014, Journal of Mathematical Imaging and Vision.

[4]  A. Wood,et al.  Saddlepoint approximations for the Bingham and Fisher–Bingham normalising constants , 2005 .

[5]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[6]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jizhou Sun,et al.  6D Dynamic Camera Relocalization from Single Reference Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nassir Navab,et al.  Scene Coordinate and Correspondence Learning for Image-Based Localization , 2018, BMVC.

[9]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[11]  Slobodan Ilic,et al.  Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC , 2018, NeurIPS.

[12]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Suvrit Sra,et al.  Directional Statistics in Machine Learning: a Brief Review , 2016, 1605.00316.

[14]  Lourdes Agapito,et al.  DiverseNet: When One Right Answer is not Enough , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jared Glover,et al.  The quaternion Bingham Distribution, 3D object detection, and dynamic manipulation , 2014 .

[16]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[17]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Nassir Navab,et al.  Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Christophe Ley,et al.  Modern Directional Statistics , 2017 .

[20]  Howie Choset,et al.  Probabilistic pose estimation using a Bingham distribution-based linear filter , 2018, Int. J. Robotics Res..

[21]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Eric Brachmann,et al.  Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  S. Srihari Mixture Density Networks , 1994 .

[24]  Leslie Pack Kaelbling,et al.  Tracking the spin on a ping pong ball with the quaternion Bingham filter , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Giorgia Pitteri,et al.  On Object Symmetries and 6D Pose Estimation from Images , 2019, 2019 International Conference on 3D Vision (3DV).

[28]  Thomas Brox,et al.  Learning Representations for Predicting Future Activities , 2019, ArXiv.

[29]  Zoltan-Csaba Marton,et al.  Multi-view orientation estimation using Bingham mixture models , 2016, 2016 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR).

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  Nassir Navab,et al.  Camera Pose Filtering with Local Regression Geodesics on the Riemannian Manifold of Dual Quaternions , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[32]  Valérie Gouet-Brunet,et al.  A survey on Visual-Based Localization: On the benefit of heterogeneous data , 2018, Pattern Recognit..

[33]  C. Herz BESSEL FUNCTIONS OF MATRIX ARGUMENT , 1955 .

[34]  Peter Meer,et al.  Nonlinear Mean Shift over Riemannian Manifolds , 2009, International Journal of Computer Vision.

[35]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Paul Timothy Furgale,et al.  Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems , 2014, IEEE Transactions on Robotics.

[37]  Henggang Cui,et al.  Multimodal Trajectory Predictions for Autonomous Driving using Deep Convolutional Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Leonidas J. Guibas,et al.  Synchronizing Probability Measures on Rotations via Optimal Transport , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Slobodan Ilic,et al.  Online inspection of 3D parts via a locally overlapping camera network , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Gary R. Bradski,et al.  Monte Carlo Pose Estimation with Quaternion Kernels and the Bingham Distribution , 2011, Robotics: Science and Systems.

[42]  Roland Siegwart,et al.  Directional Statistics and Filtering Using libDirectional , 2017, Journal of Statistical Software.

[43]  Jonathan Kelly,et al.  Probabilistic Regression of Rotations using Quaternion Averaging and a Deep Multi-Headed Network , 2019, ArXiv.

[44]  Umut Simsekli,et al.  Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  F. Sebastian Grassia,et al.  Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[46]  Christopher Bingham An Antipodally Symmetric Distribution on the Sphere , 1974 .

[47]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Igor Gilitschenski,et al.  Deep Orientation Uncertainty Learning based on a Bingham Loss , 2020, ICLR.

[51]  Eitan Marder-Eppstein,et al.  Project Tango , 2016, SIGGRAPH Real-Time Live!.

[52]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[56]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[59]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[61]  Sanja Fidler,et al.  Pose Estimation for Objects with Rotational Symmetry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[62]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[63]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[65]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[66]  Maximilian Baust,et al.  Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[68]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[69]  A. Morawiec,et al.  Rodrigues parameterization for orientation and misorientation distributions , 1996 .

[70]  Slobodan Ilic,et al.  Survey of Higher Order Rigid Body Motion Interpolation Methods for Keyframe Animation and Continuous-Time Trajectory Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[71]  Atsushi Yamaji,et al.  Genetic algorithm for fitting a mixed Bingham distribution to 3D orientations: a tool for the statistical and paleostress analyses of fracture orientations , 2016 .

[72]  François Michaud,et al.  RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation , 2018, J. Field Robotics.

[73]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Gerhard Kurz,et al.  Recursive estimation of orientation based on the Bingham distribution , 2013, Proceedings of the 16th International Conference on Information Fusion.

[75]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[76]  S. Ullman The interpretation of structure from motion , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[77]  Radu Horaud,et al.  An analytic solution for the perspective 4-point problem , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[78]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[79]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[80]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[81]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[82]  Patrick Forré,et al.  Reparameterizing Distributions on Lie Groups , 2019, AISTATS.

[83]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Vladlen Koltun,et al.  Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[85]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[86]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[87]  Thomas Brox,et al.  Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Slobodan Ilic,et al.  3D Local Features for Direct Pairwise Registration , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Arthur Cayley Sur quelques propriétés des déterminants gauches. , 1846 .