3D Town: The Automatic Urban Awareness Project

The 3DTown project is focused on the development of a distributed system for sensing, interpreting and visualizing the real-time dynamics of urban life within the 3D context of a city. At the heart of this technology lies a core of algorithms that automatically integrate 3D urban models with data from pan/tilt video cameras, environmental sensors and other real-time information sources. A key challenge is the "three-dimensionalization" of pedestrians and vehicles tracked in 2D camera video, which requires automatic real-time computation of camera pose relative to the 3D urban environment. In this paper we report preliminary results from a prototype system we call 3DTown, which is composed of discrete modules connected through pre-determined communication protocols. Currently, these modules consist of: 1) A 3D modeling module that allows for the efficient reconstruction of building models and integration with indoor architectural plans, 2) A GeoWeb server that indexes a 3D urban database to render perspective views of both outdoor and indoor environments from any requested vantage, 3) Sensor modules that receive and distribute real-time data, 4) Tracking modules that detect and track pedestrians and vehicles in urban spaces and access highways, 5) Camera pose modules that automatically estimate camera pose relative to the urban environment, 6) Three-dimensionalization modules that receive information from the GeoWeb server, tracking and camera pose modules in order to back-project image tracks to geolocate pedestrians and vehicles within the 3D model, 7) An animation module that represents geo-located dynamic agents as sprites, and 8) A web-based visualization module that allows a user to explore the resulting dynamic 3D visualization in a number of interesting ways. To demonstrate our system we have used a blend of automatic and semi-automatic methods to construct a rich and accurate 3D model of a university campus, including both outdoor and indoor detail. The demonstration allows web-based 3D visualization of recorded patterns of pedestrian and vehicle traffic on streets and highways, estimations of vehicle speed, and real-time (live) visualization of pedestrian traffic and temperature data at a particular test site. Having demonstrated the system for hundreds of people, we report our informal observations on the user reaction, potential application areas and on the main challenges that must be addressed to bring the system closer to deployment.

[1]  Alan L. Yuille,et al.  Manhattan World: compass direction from a single image by Bayesian inference , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Suya You,et al.  Augmented virtual environments (AVE): dynamic fusion of imagery and 3D models , 2003, IEEE Virtual Reality, 2003. Proceedings..

[3]  Stephen T. Barnard,et al.  Interpreting Perspective Image , 1983, Artif. Intell..

[4]  James H. Elder,et al.  Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery , 2008, ECCV.

[5]  Sangmin Oh,et al.  Augmenting Aerial Earth Maps with dynamic information , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[6]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Robert T. Collins,et al.  Cooperative Multi-Sensor Video Surveillance , 1999 .

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  D. Bailey Raster Based Region Growing , 1991 .

[10]  Suya You,et al.  3D video surveillance with Augmented Virtual Environments , 2003, IWVS '03.

[11]  Richard O. Duda,et al.  Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.

[12]  J. Li-Chee-Ming,et al.  Generation of three dimensional photo-realistic models from Lidar and image data , 2009, 2009 IEEE Toronto International Conference Science and Technology for Humanity (TIC-STH).

[13]  Mordecai Avriel,et al.  Nonlinear programming , 1976 .

[14]  Pushmeet Kohli,et al.  On Detection of Multiple Object Instances Using Hough Transforms , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Steven W. Zucker,et al.  Local Scale Control for Edge Detection and Blur Estimation , 1996, ECCV.

[16]  Supun Samarasekera,et al.  Video Flashlights: Real Time Rendering of Multiple Videosfor Immersive Model Visualization , 2002, Rendering Techniques.

[17]  James H. Elder,et al.  Pre-Attentive and Attentive Detection of Humans in Wide-Field Scenes , 2007, International Journal of Computer Vision.

[18]  Stuart J. Russell,et al.  Image Segmentation in Video Sequences: A Probabilistic Approach , 1997, UAI.

[19]  Takeo Kanade,et al.  Advances in Cooperative Multi-Sensor Video Surveillance , 1999 .

[20]  Claus Gramkow,et al.  On Averaging Rotations , 2001, International Journal of Computer Vision.

[21]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[22]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..