Camera Pose Estimation with Semantic 3D Model

In computer vision, estimating camera pose from correspondences between 3D geometric entities and their projections into the image is a widely investigated problem. Although most state-of-the-art methods exploit simple primitives such as points or lines, and thus require dense scene models, the emergence of very effective CNN-based object detectors in the recent years have paved the way to the use of much lighter 3D models composed solely of a few semantically relevant features. In that context, we propose a novel model-based camera pose estimation method in which the scene is modeled by a set of virtual ellipsoids. We show that 6-DoF camera pose can be determined by optimizing only the three orientation parameters, and that at least two correspondences between 3D ellipsoids and their 2D projections are necessary in practice. We validate the approach on both simulated and real environments.

[1]  Alessio Del Bue,et al.  Probabilistic Structure from Motion with Objects (PSfMO) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Cosimo Rubino,et al.  3D Object Localisation from Multi-View Image Detections , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[4]  Sivan Toledo,et al.  A Generalized Courant-Fischer Minimax Theorem , 2008 .

[5]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[6]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[7]  Phil Palmer,et al.  Autonomous Pose Determination of a Passive Target Through Spheroid Modelling , 2008 .

[8]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[10]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[12]  Alessio Del Bue,et al.  Structure from Motion with Objects , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[14]  Michael Milford,et al.  QuadricSLAM: Dual Quadrics From Object Detections as Landmarks in Object-Oriented SLAM , 2018, IEEE Robotics and Automation Letters.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Phil L. Palmer,et al.  Perspective Reconstruction of a Spheroid from an Image Plane Ellipse , 2010, International Journal of Computer Vision.

[18]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[19]  Jimmy Li,et al.  Semantic Mapping for View-Invariant Relocalization , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Sei Ikeda,et al.  Visual SLAM algorithms: a survey from 2010 to 2016 , 2017, IPSJ Transactions on Computer Vision and Applications.

[21]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Marie-Odile Berger,et al.  Perspective-12-Quadric: An analytical solution to the camera pose estimation problem from conic - quadric correspondences , 2019 .

[23]  Roland Siegwart,et al.  A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation , 2011, CVPR 2011.

[24]  Reinhard Koch,et al.  Pose Estimation from Line Correspondences: A Complete Analysis and a Series of Solutions , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).