From Google Street View to 3D city models

We present a structure-from-motion (SfM) pipeline for visual 3D modeling of a large city area using 360° field of view Google Street View images. The core of the pipeline combines the state of the art techniques such as SURF feature detection, tentative matching by an approximate nearest neighbour search, relative camera motion estimation by solving 5-pt minimal camera pose problem, and sparse bundle adjustment. The robust and stable camera poses estimated by PROSAC with soft voting and by scale selection using a visual cone test bring high quality initial structure for bundle adjustment. Furthermore, searching for trajectory loops based on co-occurring visual words and closing them by adding new constraints for the bundle adjustment enforce the global consistency of camera poses and 3D structure in the sequence. We present a large-scale reconstruction computed from 4,799 images of the Google Street View Pittsburgh Research Data Set.

[1]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[2]  ARMIN GRUEN Automation in Building Reconstruction , 2000 .

[3]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[6]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[7]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[8]  Frederic Devernay,et al.  Using robust methods for automatic extraction of buildings , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Axel Pinz,et al.  Globally Optimal O(n) Solution to the PnP Problem for General Camera Models , 2008, BMVC.

[10]  Hans-Gerd Maas,et al.  The suitability of airborne laser scanner data for automatic 3D object reconstruction , 2001 .

[11]  Kostas Daniilidis,et al.  Monocular Visual Odometry in Urban Environments , 2008 .

[12]  Maxime Lhuillier Effective and Generic Structure from Motion using Angular Error , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[13]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[14]  Roland Siegwart,et al.  Closing the Loop in Appearance-Guided Structure-from-Motion for Omnidirectional Cameras , 2008 .

[15]  Jan-Michael Frahm,et al.  Towards Urban 3D Reconstruction from Video , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[16]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[17]  David Nistér A Minimal Solution to the Generalised 3-Point Pose Problem , 2004, CVPR.

[18]  Michal Havlena,et al.  Structure from Omnidirectional Stereo Rig Motion for City Modeling , 2008, VISAPP.

[19]  Michal Havlena,et al.  Omnidirectional Image Stabilization by Computing Camera Trajectory , 2009, PSIVT.

[20]  C. Brenner FAST PRODUCTION OF VIRTUAL REALITY CITY MODELS , 2003 .

[21]  Luc Van Gool,et al.  Omnidirectional Vision Based Topological Navigation , 2007, International Journal of Computer Vision.

[22]  Michal Havlena,et al.  Randomized structure from motion based on atomic 3D models from camera triplets , 2009, CVPR.

[23]  C. Brenner,et al.  AN INTEGRATED SYSTEM FOR URBAN MODEL GENERATION , 2000 .

[24]  Manolis I. A. Lourakis,et al.  The design and implementation of a generic sparse bundle adjustment software package based on the Le , 2004 .

[25]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[26]  Tomás Pajdla,et al.  Robust Rotation and Translation Estimation in Multiview Reconstruction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Luc Van Gool,et al.  Fast Compact City Modeling for Navigation Pre-Visualization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  F. Kahl Multiple View Geometry and the L-infinity Norm , 2005, ICCV 2005.

[30]  Jana Kosecka,et al.  Piecewise planar city 3D modeling from street view panoramic sequences , 2009, CVPR.

[31]  C. Zach,et al.  Generalized Detection and Merging of Loop Closures for Video Sequences , 2008 .

[32]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[33]  Takeo Kanade,et al.  Quasiconvex Optimization for Robust Geometric Reconstruction , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jan-Michael Frahm,et al.  A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus , 2008, ECCV.