Down to Earth: Using Semantics for Robust Hypothesis Selection for the Five-Point Algorithm

The computation of the essential matrix using the five-point algorithm is a staple task usually considered as being solved. However, we show that the algorithm frequently selects erroneous solutions in the presence of noise and outliers. These errors arise when the supporting point correspondences supplied to the algorithm do not adequately cover all essential planes in the scene, leading to ambiguous essential matrix solutions. This is not merely a theoretical problem: such scene conditions often occur in 3D reconstruction of real-world data when fronto-parallel point correspondences, such as points on building facades, are captured but correspondences on obliquely observed planes, such as the ground plane, are missed. To solve this problem, we propose to leverage semantic labelings of image features to guide hypothesis selection in the five-point algorithm. More specifically, we propose a two-stage RANSAC procedure in which, in the first step, only features classified as ground points are processed. These inlier ground features are subsequently used to score two-view geometry hypotheses generated by the five-point algorithm using samples of non-ground points. Results for scenes with prominent ground regions demonstrate the ability of our approach to recover epipolar geometries that describe the entire scene, rather than only well-sampled scene planes.

[1]  Jiri Matas,et al.  MODS: Fast and robust method for two-view matching , 2015, Comput. Vis. Image Underst..

[2]  Richard I. Hartley,et al.  Global Optimization through Searching Rotation Space and Optimal Estimation of the Essential Matrix , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[3]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[4]  Hongdong Li,et al.  Five-Point Motion Estimation Made Easy , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  V. Rodehorst,et al.  EVALUATION OF RELATIVE POSE ESTIMATION METHODS FOR MULTI-CAMERA SETUPS , 2008 .

[6]  David Nister,et al.  Recent developments on direct relative orientation , 2006 .

[7]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[8]  Jiri Matas,et al.  WxBS: Wide Baseline Stereo Generalizations , 2015, BMVC.

[9]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[10]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[11]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[12]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[13]  Jan-Michael Frahm,et al.  Reconstructing the world* in six days , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Marc Pollefeys,et al.  Disambiguating visual relations using loop constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Heiko Hirschmüller,et al.  Dense 3D Reconstruction from Wide Baseline Image Sets , 2011, Theoretical Foundations of Computer Vision.

[17]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[18]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Philip H. S. Torr An assessment of information criteria for motion model selection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Jiri Matas,et al.  Locally Optimized RANSAC , 2003, DAGM-Symposium.

[21]  Richard I. Hartley,et al.  Optimal Algorithms in Multiview Geometry , 2007, ACCV.

[22]  Jiri Matas,et al.  Two-view geometry estimation unaffected by a dominant plane , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Ilan Shimshoni,et al.  Epipolar Geometry Estimation for Urban Scenes with Repetitive Structures , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[25]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[28]  O. Faugeras,et al.  Motion from point matches: Multiplicity of solutions , 1989, [1989] Proceedings. Workshop on Visual Motion.

[29]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[30]  Jan-Michael Frahm,et al.  Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset) , 2015, CVPR 2015.

[31]  Richard I. Hartley,et al.  In Defense of the Eight-Point Algorithm , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..