Implicit-PDF: Non-Parametric Representation of Probability Distributions on the Rotation Manifold

Single image pose estimation is a fundamental problem in many vision and robotics tasks, and existing deep learning approaches suffer by not completely modeling and handling: i) uncertainty about the predictions, and ii) symmetric objects with multiple (sometimes infinite) correct poses. To this end, we introduce a method to estimate arbitrary, non-parametric distributions on SO(3). Our key idea is to represent the distributions implicitly, with a neural network that estimates the probability given the input image and a candidate pose. Grid sampling or gradient ascent can be used to find the most likely pose, but it is also possible to evaluate the probability at any pose, enabling reasoning about symmetries and uncertainty. This is the most general way of representing distributions on manifolds, and to showcase the rich expressive power, we introduce a dataset of challenging symmetric and nearly-symmetric objects. We require no supervision on pose uncertainty – the model trains only with a single pose per example. Nonetheless, our implicit model is highly expressive to handle complex distributions over 3D poses, while still obtaining accurate pose estimation on standard non-ambiguous environments, achieving state-of-the-art performance on Pascal3D+ and ModelNet10-SO(3) benchmarks. Code, data, and visualizations may be found at implicit-pdf.github.io.

[1]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[3]  Jonathan Tompson,et al.  Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning , 2018, NeurIPS.

[4]  René Vidal,et al.  A Mixed Classification-Regression Framework for 3D Pose Estimation from 2D Images , 2018, BMVC.

[5]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kostas Daniilidis,et al.  Cross-Domain 3D Equivariant Image Embeddings , 2018, ICML.

[8]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[9]  Taeyoung Lee,et al.  Global symplectic uncertainty propagation on SO(3) , 2008, 2008 47th IEEE Conference on Decision and Control.

[10]  Brian Okorn,et al.  Learning Orientation Distributions for Object Pose Estimation , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Steven M. LaValle,et al.  Generating Uniform Incremental Grids on SO(3) Using the Hopf Fibration , 2010, WAFR.

[12]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[13]  Leonidas J. Guibas,et al.  ObjectNet3D: A Large Scale Database for 3D Object Recognition , 2016, ECCV.

[14]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Noah Snavely,et al.  An Analysis of SVD for Deep Rotation Estimation , 2020, NeurIPS.

[16]  Sanja Fidler,et al.  Pose Estimation for Objects with Rotational Symmetry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  David A. Forsyth,et al.  Extracting projective structure from single perspective views of 3D point sets , 1993, 1993 (4th) International Conference on Computer Vision.

[18]  K. Gorski,et al.  HEALPix: A Framework for High-Resolution Discretization and Fast Analysis of Data Distributed on the Sphere , 2004, astro-ph/0409513.

[19]  Allen Y. Yang,et al.  On Symmetry and Multiple-View Geometry: Structure, Pose, and Calibration from a Single Image , 2004, International Journal of Computer Vision.

[20]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[21]  Gurtej Kanwar,et al.  Normalizing Flows on Tori and Spheres , 2020, ICML.

[22]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[23]  Igor Gilitschenski,et al.  Deep Orientation Uncertainty Learning based on a Bingham Loss , 2020, ICLR.

[24]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[25]  S. R. Jammalamadaka,et al.  Directional Statistics, I , 2011 .

[26]  Patrick Forré,et al.  Reparameterizing Distributions on Lie Groups , 2019, AISTATS.

[27]  Giorgia Pitteri,et al.  On Object Symmetries and 6D Pose Estimation from Images , 2019, 2019 International Conference on 3D Vision (3DV).

[28]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[29]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[30]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[31]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[32]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[33]  Nassir Navab,et al.  Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation , 2020, International Journal of Computer Vision.

[34]  Simon Li,et al.  Uncertainties in real‐time flood forecasting with neural networks , 2007 .

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Cees Snoek,et al.  Spherical Regression: Learning Viewpoints, Surface Normals and 3D Rotations on N-Spheres , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  G. Bianchi,et al.  Probabilistic orientation estimation with matrix Fisher distributions , 2020, NeurIPS.

[40]  Roberto Cipolla,et al.  Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning , 2017, IJCAI.

[41]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Ashutosh Saxena,et al.  Learning 3-D object orientation from images , 2009, 2009 IEEE International Conference on Robotics and Automation.

[44]  Timothy Bretl,et al.  PoseRBPF: A Rao-Blackwellized Particle Filter for6D Object Pose Estimation , 2019, Robotics: Science and Systems.