论文信息 - Optimal Transformation Estimation with Semantic Cues

Optimal Transformation Estimation with Semantic Cues

This paper addresses the problem of estimating the geometric transformation relating two distinct visual modalities (e.g. an image and a map, or a projective structure and a Euclidean 3D model) while relying only on semantic cues, such as semantically segmented regions or object bounding boxes. The proposed approach differs from the traditional feature-to-feature correspondence reasoning: starting from semantic regions on one side, we seek their possible corresponding regions on the other, thus constraining the sought geometric transformation. This entails a simultaneous search for the transformation and for the region-to-region correspondences. This paper is the first to derive the conditions that must be satisfied for a convex region, defined by control points, to be transformed inside an ellipsoid. These conditions are formulated as Linear Matrix Inequalities and used within a Branch-and-Prune search to obtain the globally optimal transformation. We tested our approach, under mild initial bound conditions, on two challenging registration problems for aligning: (i) a semantically segmented image and a map via a 2D homography; (ii) a projective 3D structure and its Euclidean counterpart.

Luc Van Gool | Danda Pani Paudel | Adlane Habed

[1] Philip H. S. Torr,et al. Automatic dense visual semantic mapping from street-level imagery , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2] Pascal Vasseur,et al. LMI-based 2D-3D registration: From uncalibrated images to Euclidean scene , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[6] Pascal Fua,et al. On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Fuzhen Zhang. The Schur Complement , 2012 .

[8] F. Uhlig. A recurring theorem about pairs of quadratic forms and extensions: a survey , 1979 .

[9] Xilin Chen,et al. Projection Metric Learning on Grassmann Manifold with Application to Video based Face Recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Richard I. Hartley,et al. Iterative Extensions of the Sturm/Triggs Algorithm: Convergence and Nonconvergence , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Luc Van Gool,et al. Matching Features Correctly through Semantic Understanding , 2014, 2014 2nd International Conference on 3D Vision.

[12] Torsten Sattler,et al. Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[13] James M. Rehg,et al. Adaptive Structure from Motion with a Contrario Model Estimation , 2012, ACCV.

[14] V. Powers,et al. An algorithm for sums of squares of real polynomials , 1998 .

[15] Matthew Brand,et al. Geolocalization using skylines from omni-images , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[16] E. Yaz. Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[17] Jiaolong Yang,et al. Go-ICP: Solving 3D Registration Efficiently and Globally Optimally , 2013, 2013 IEEE International Conference on Computer Vision.

[18] Ilya Kostrikov,et al. PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[19] Stéphane Christy,et al. Iterative Pose Computation from Line Correspondences , 1999, Comput. Vis. Image Underst..

[20] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21] Torsten Sattler,et al. Large-Scale Location Recognition and the Geometric Burstiness Problem , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Andrew Y. Ng,et al. Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[23] Rob Fergus,et al. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[24] Tomasz Malisiewicz,et al. Deep Image Homography Estimation , 2016, ArXiv.

[25] D. Hilbert. Über die Darstellung definiter Formen als Summe von Formenquadraten , 1888 .

[26] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Javier Civera,et al. Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28] Alessio Del Bue,et al. Structure from Motion with Objects , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Ioannis Stamos,et al. Automatic 3D to 2D registration for the photorealistic rendering of urban scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30] F. John. Extremum Problems with Inequalities as Subsidiary Conditions , 2014 .

[31] Ramon E. Moore,et al. Methods and Applications of Interval Analysis (SIAM Studies in Applied and Numerical Mathematics) (Siam Studies in Applied Mathematics, 2.) , 1979 .

[32] Michael Milford,et al. Sequence searching with deep-learnt depth for condition- and viewpoint-invariant route-based place recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34] Marc Pollefeys,et al. Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Roberto Scopigno,et al. Fully Automatic Registration of Image Sets on Approximate Geometry , 2012, International Journal of Computer Vision.

[36] Stephen P. Boyd,et al. Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[37] Pascal Vasseur,et al. Robust and Optimal Sum-of-Squares-Based Point-to-Plane Registration of Image Sets and Structured Scenes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38] Tomás Pajdla,et al. Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[39] Luc Van Gool,et al. 3D all the way: Semantic segmentation of urban scenes from start to end in 3D , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Katta G. Murty,et al. Some NP-complete problems in linear programming , 1982, Oper. Res. Lett..

[41] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] B. Reznick,et al. Sums of squares of real polynomials , 1995 .

[44] Viktor Larsson,et al. Optimal Relative Pose with Unknown Correspondences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).