3D Urban Scene Modeling Integrating Recognition and Reconstruction

Abstract Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this pre-visualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed. Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other’s continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model.

[1]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[2]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Luc Van Gool,et al.  Fast Compact City Modeling for Navigation Pre-Visualization , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Roberto Cipolla,et al.  Combining single view recognition and multiple view stereo for architectural scenes , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Suya You,et al.  Approaches to Large-Scale Urban Modeling , 2003, IEEE Computer Graphics and Applications.

[6]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[7]  George Vosselman,et al.  3D BUILDING MODEL RECONSTRUCTION FROM POINT CLOUDS AND GROUND PLANS , 2001 .

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Bernt Schiele,et al.  Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  C. Brenner,et al.  AN INTEGRATED SYSTEM FOR URBAN MODEL GENERATION , 2000 .

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Hans-Gerd Maas,et al.  The suitability of airborne laser scanner data for automatic 3D object reconstruction , 2001 .

[13]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  SchieleBernt,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008 .

[15]  Frederic Devernay,et al.  Using robust methods for automatic extraction of buildings , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[17]  Luc Van Gool,et al.  Dynamic 3D Scene Analysis from a Moving Vehicle , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yizhou Yu,et al.  Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping , 1998, Rendering Techniques.

[19]  Ruzena Bajcsy,et al.  Segmentation of range images as the search for geometric parametric models , 1995, International Journal of Computer Vision.

[20]  Luc Van Gool,et al.  Integrating Recognition and Reconstruction for Cognitive Traffic Scene Analysis from a Moving Vehicle , 2006, DAGM-Symposium.

[21]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[22]  Ioannis Stamos,et al.  3-D model construction using range and image data , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Joonki Paik,et al.  3D reconstruction of indoor and outdoor scenes using a mobile range scanner , 2002, Object recognition supported by user interaction for service robots.

[25]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[26]  Christian Früh,et al.  Data Processing Algorithms for Generating Textured 3D Building Facade Meshes from Laser Scans and Camera Images , 2005, International Journal of Computer Vision.

[27]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[28]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Luc Van Gool,et al.  3D City Modeling Using Cognitive Loops , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[30]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[31]  Olga Veksler,et al.  Fast variable window for stereo correspondence using integral images , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  M. Wolf,et al.  Photogrammetric Data Capture and Calculation for 3D City Models , 2000 .

[33]  Ruigang Yang,et al.  Multi-resolution real-time stereo on commodity graphics hardware , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[34]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[35]  Xinhua Zhuang,et al.  Pose estimation from corresponding point data , 1989, IEEE Trans. Syst. Man Cybern..

[36]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[37]  C. Brenner FAST PRODUCTION OF VIRTUAL REALITY CITY MODELS , 2003 .

[38]  Bernt Schiele,et al.  Segmentation Based Multi-Cue Integration for Object Detection , 2006, BMVC.

[39]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[40]  Christian Früh,et al.  3D model generation for cities using aerial photographs and ground level laser scans , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[41]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[42]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[43]  Bernt Schiele,et al.  Multiple Object Class Detection with a Generative Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44]  Luc Van Gool,et al.  Real-time connectivity constrained depth map computation using programmable graphics hardware , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  ARMIN GRUEN Automation in Building Reconstruction , 2000 .