Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation

In this work, we introduce Deep Bingham Networks (DBN), a generic framework that can naturally handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data. While existing works strive to find a single solution to the pose estimation problem, we make peace with the ambiguities causing high uncertainty around which solutions to identify as the best. Instead, we report a family of poses which capture the nature of the solution space. DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes; and (ii) novel loss functions that benefit from Bingham distributions on rotations. This way, DBN can work both in unambiguous cases providing uncertainty information, and in ambiguous scenes where an uncertainty per mode is desired. On a technical front, our network regresses continuous Bingham mixture models and is applicable to both 2D data such as images and to 3D data such as point clouds. We proposed new training strategies so as to avoid mode or posterior collapse during training and to improve numerical H. Deng* E-mail: haowen.deng@tum.de M. Bui* E-mail: mai.bui@tum.de N. Navab E-mail: nassir.navab@tum.de L. Guibas E-mail: guibas@cs.stanford.edu S. Ilic E-mail: slobodan.ilic@tum.de T. Birdal [0000-0001-7915-7964] E-mail: t.birdal@stanford.edu 1. Informatics at Technische Universität München, Munich, Germany · 2. Corporate Technology Siemens AG, Munich, Germany · 3. Computer Science Department, Stanford University, CA USA · * shared first authorship U n ce rt ai n ty N o U n ce rt ai n ty Ambiguity No Ambiguity

[1]  Valérie Gouet-Brunet,et al.  A survey on Visual-Based Localization: On the benefit of heterogeneous data , 2018, Pattern Recognit..

[2]  C. Herz BESSEL FUNCTIONS OF MATRIX ARGUMENT , 1955 .

[3]  Shadi Albarqouni,et al.  Adversarial Networks for Camera Pose Regression and Refinement , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[4]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Nassir Navab,et al.  6D Camera Relocalization in Ambiguous Scenes via Continuous Multimodal Inference , 2020, ECCV.

[6]  Yue Wang,et al.  PRNet: Self-Supervised Learning for Partial-to-Partial Registration , 2019, NeurIPS.

[7]  F. Sebastian Grassia,et al.  Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[8]  Eric Brachmann,et al.  Random forests versus Neural Networks — What's best for camera localization? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Slobodan Ilic,et al.  DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[10]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[11]  Q. Wang Probability distribution and entropy as a measure of uncertainty , 2006, cond-mat/0612076.

[12]  Radu Horaud,et al.  An analytic solution for the perspective 4-point problem , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Jonathan Kelly,et al.  Probabilistic Regression of Rotations using Quaternion Averaging and a Deep Multi-Headed Network , 2019, ArXiv.

[15]  Umut Simsekli,et al.  Probabilistic Permutation Synchronization Using the Riemannian Structure of the Birkhoff Polytope , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[17]  Emanuele Menegatti,et al.  Quaternion Equivariant Capsule Networks for 3D Point Clouds , 2019, ECCV.

[18]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[20]  Torsten Sattler,et al.  Hyperpoints and Fine Vocabularies for Large-Scale Location Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Slobodan Ilic,et al.  PPFNet: Global Context Aware Local Features for Robust 3D Point Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[23]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[24]  Slobodan Ilic,et al.  Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC , 2018, NeurIPS.

[25]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[26]  Sanja Fidler,et al.  Pose Estimation for Objects with Rotational Symmetry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Slobodan Ilic,et al.  PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors , 2018, ECCV.

[28]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[29]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[31]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[32]  Yue Wang,et al.  Deep Closest Point: Learning Representations for Point Cloud Registration , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Shuda Li,et al.  RelocNet: Continuous Metric Learning Relocalisation Using Neural Nets , 2018, ECCV.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  A. Wood,et al.  Saddlepoint approximations for the Bingham and Fisher–Bingham normalising constants , 2005 .

[37]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[38]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jizhou Sun,et al.  6D Dynamic Camera Relocalization from Single Reference Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Cees Snoek,et al.  Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Nassir Navab,et al.  Camera Pose Filtering with Local Regression Geodesics on the Riemannian Manifold of Dual Quaternions , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[42]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Roland Siegwart,et al.  Directional Statistics and Filtering Using libDirectional , 2017, Journal of Statistical Software.

[44]  S. Srihari Mixture Density Networks , 1994 .

[45]  Leslie Pack Kaelbling,et al.  Tracking the spin on a ping pong ball with the quaternion Bingham filter , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Slobodan Ilic,et al.  CAD Priors for Accurate and Flexible Instance Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Gerhard Kurz,et al.  Recursive estimation of orientation based on the Bingham distribution , 2013, Proceedings of the 16th International Conference on Information Fusion.

[48]  Nassir Navab,et al.  Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[51]  Peter Meer,et al.  Nonlinear Mean Shift over Riemannian Manifolds , 2009, International Journal of Computer Vision.

[52]  Andrew W. Fitzgibbon,et al.  Exploiting uncertainty in regression forests for accurate camera relocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Paul Timothy Furgale,et al.  Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems , 2014, IEEE Transactions on Robotics.

[54]  Leonidas J. Guibas,et al.  Synchronizing Probability Measures on Rotations via Optimal Transport , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Slobodan Ilic,et al.  Online inspection of 3D parts via a locally overlapping camera network , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[56]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Yann LeCun,et al.  Predicting Deeper into the Future of Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[58]  Maximilian Baust,et al.  Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Yasuhiro Aoki,et al.  PointNetLK: Robust & Efficient Point Cloud Registration Using PointNet , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Torsten Sattler,et al.  Understanding the Limitations of CNN-Based Absolute Camera Pose Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[62]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[63]  Jan Kautz,et al.  Geometry-Aware Learning of Maps for Camera Localization , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Patrick Forré,et al.  Reparameterizing Distributions on Lie Groups , 2019, AISTATS.

[65]  Sen Wang,et al.  VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[67]  Thomas Brox,et al.  Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[69]  Slobodan Ilic,et al.  Point Pair Features Based Object Detection and Pose Estimation Revisited , 2015, 2015 International Conference on 3D Vision.

[70]  Gary R. Bradski,et al.  Monte Carlo Pose Estimation with Quaternion Kernels and the Bingham Distribution , 2011, Robotics: Science and Systems.

[71]  Christopher Bingham An Antipodally Symmetric Distribution on the Sphere , 1974 .

[72]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Slobodan Ilic,et al.  3D Local Features for Direct Pairwise Registration , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[75]  Shenghua Gao,et al.  Future Frame Prediction for Anomaly Detection - A New Baseline , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Slobodan Ilic,et al.  Survey of Higher Order Rigid Body Motion Interpolation Methods for Keyframe Animation and Continuous-Time Trajectory Estimation , 2018, 2018 International Conference on 3D Vision (3DV).

[78]  François Michaud,et al.  RTAB‐Map as an open‐source lidar and visual simultaneous localization and mapping library for large‐scale and long‐term online operation , 2018, J. Field Robotics.

[79]  Eric Brachmann,et al.  DSAC — Differentiable RANSAC for Camera Localization , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Leonidas J. Guibas,et al.  Deep Hough Voting for 3D Object Detection in Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[81]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Nassir Navab,et al.  Scene Coordinate and Correspondence Learning for Image-Based Localization , 2018, BMVC.

[83]  Thomas Brox,et al.  Learning Representations for Predicting Future Activities , 2019, ArXiv.

[84]  Slobodan Ilic,et al.  3D object instance recognition and pose estimation using triplet loss with dynamic margin , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[85]  Eric Brachmann,et al.  Learning Less is More - 6D Camera Localization via 3D Surface Regression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[86]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[87]  A. Morawiec,et al.  Rodrigues parameterization for orientation and misorientation distributions , 1996 .

[88]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[89]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[90]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[91]  S. Ullman The interpretation of structure from motion , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[92]  Varun Ramakrishna,et al.  Predicting Multiple Structured Visual Interpretations , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[93]  Howie Choset,et al.  Probabilistic pose estimation using a Bingham distribution-based linear filter , 2018, Int. J. Robotics Res..

[94]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[95]  Giorgia Pitteri,et al.  On Object Symmetries and 6D Pose Estimation from Images , 2019, 2019 International Conference on 3D Vision (3DV).

[96]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[97]  Igor Gilitschenski,et al.  Deep Orientation Uncertainty Learning based on a Bingham Loss , 2020, ICLR.

[98]  Martial Hebert,et al.  Iterative Transformer Network for 3D Point Cloud , 2018, ArXiv.

[99]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[100]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[101]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .