Estimating camera pose from a single urban ground-view omnidirectional image and a 2D building outline map

A framework is presented for estimating the pose of a camera based on images extracted from a single omnidirectional image of an urban scene, given a 2D map with building outlines with no 3D geometric information nor appearance data. The framework attempts to identify vertical corner edges of buildings in the query image, which we term VCLH, as well as the neighboring plane normals, through vanishing point analysis. A bottom-up process further groups VCLH into elemental planes and subsequently into 3D structural fragments modulo a similarity transformation. A geometric hashing lookup allows us to rapidly establish multiple candidate correspondences between the structural fragments and the 2D map building contours. A voting-based camera pose estimation method is then employed to recover the correspondences admitting a camera pose solution with high consensus. In a dataset that is even challenging for humans, the system returned a top-30 ranking for correct matches out of 3600 camera pose hypotheses (0.83% selectivity) for 50.9% of queries.

[1]  Ramakant Nevatia,et al.  Automatic pose estimation of complex 3D building models , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[2]  Konrad Tollmar,et al.  Searching the Web with mobile images for location recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Keith Yu Kit Leung,et al.  Localization in urban environments by matching ground level video images with an aerial image , 2008, 2008 IEEE International Conference on Robotics and Automation.

[4]  Refractor Vision , 2000, The Lancet.

[5]  Jana Kosecka,et al.  Detection and matching of rectilinear structures , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[7]  Haim J. Wolfson,et al.  Geometric hashing: an overview , 1997 .

[8]  Cordelia Schmid,et al.  The Geometry and Matching of Lines and Curves Over Multiple Views , 2000, International Journal of Computer Vision.

[9]  Luc Van Gool,et al.  Wide-baseline stereo matching with line segments , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Roberto Cipolla,et al.  An Image-Based System for Urban Navigation , 2004, BMVC.

[11]  Andrew Zisserman,et al.  New Techniques for Automated Architectural Reconstruction from Photographs , 2002, ECCV.

[12]  Gregory Dudek,et al.  Learning generative models of scene features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Frank Dellaert,et al.  Line-Based Structure from Motion for Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[14]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[15]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.