Unlocking the urban photographic record through 4D scene modeling

Vast collections of historical photographs are being digitally archived and placed online, providing an objective record of the last two centuries that remains largely untapped. We propose that time-varying 3D models can pull together and index large collections of images while also serving as a tool of historical discovery, revealing new information about the locations, dates, and contents of historical images. In particular, our goal is to use computer vision techniques to tie together a large set of historical photographs of a given city into a consistent 4D model of the city: a 3D model with time as an additional dimension. To extract 4D city models from historical images, we must perform inference about the position of cameras and scene structure in both space and time. Traditional structure from motion techniques can be used to deal with the spatial problem, while here we focus on the problem of inferring temporal information: a date for each image and a time interval for which each structural element in the scene persists. We first formulate this task as a constraint satisfaction problem based on the visibility of structural elements in each image, resulting in a temporal ordering of images. Next, we present methods to incorporate real date information into the temporal inference solution. Finally, we present a general probabilistic framework for estimating all temporal variables in structure from motion problems, including an unknown date for each camera and an unknown time interval for each structural element. Given a collection of images with mostly unknown or uncertain dates, we can use this framework to automatically recover the dates of all images by reasoning probabilistically about the visibility and existence of objects in the scene. We present results for image collections consisting of hundreds of historical images of cities taken over decades of time, including Manhattan and downtown Atlanta.

[1]  Roberto Cipolla,et al.  PhotoBuilder-3D models of architectural scenes from uncalibrated images , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[2]  Yizhou Yu,et al.  Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping , 1998, Rendering Techniques.

[3]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[4]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[5]  育久 満上,et al.  Bundler: Structure from Motion for Unordered Image Collections , 2011 .

[6]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Olivier D. Faugeras,et al.  Representing Stereo Data with the Delaunay Triangulation , 1990, Artif. Intell..

[9]  Rina Dechter,et al.  Temporal Constraint Networks , 1989, Artif. Intell..

[10]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Mani Golparvar-Fard,et al.  Monitoring of Construction Performance Using Daily Progress Photograph Logs and 4D As-Planned Models , 2009 .

[12]  Richard Szeliski,et al.  Handling occlusions in dense multi-view stereo , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Frank Dellaert,et al.  Line-Based Structure from Motion for Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[14]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[15]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[16]  Rok Sosic,et al.  3,000,000 Queens in less than one minute , 1991, SGAR.

[17]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  K. Schindler,et al.  Segmentation of building models from dense 3 D point-clouds , 2003 .

[19]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[20]  Yanxi Liu,et al.  Detecting and matching repeated patterns for automatic geo-tagging in urban environments , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[23]  Camillo J. Taylor Surface reconstruction from feature based stereo , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[24]  Massimiliano Giacomin,et al.  Integrating quantitative and qualitative fuzzy temporal constraints , 2004, AI Commun..

[25]  George Wolberg,et al.  Image morphing: a survey , 1998, The Visual Computer.

[26]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[27]  Frank Dellaert,et al.  Inferring Temporal Order of Images From 3D Structure , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[29]  Luc Van Gool,et al.  Image-based procedural modeling of facades , 2007, SIGGRAPH 2007.

[30]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[31]  O. Faugeras,et al.  The Geometry of Multiple Images , 1999 .

[32]  Roberto Cipolla,et al.  A Bayesian Estimation of Building Shape Using MCMC , 2002, ECCV.

[33]  Michael D'Zmura,et al.  4D structure from motion: a computational algorithm , 2003, IS&T/SPIE Electronic Imaging.

[34]  Horst Bischof,et al.  Fusion of Feature- and Area-Based Information for Urban Buildings Modeling from Aerial Imagery , 2008, ECCV.

[35]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  P. Messier Notes on Dating Photographic Paper , 2007 .

[37]  Takeo Kanade,et al.  Image-consistent surface triangulation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[38]  Didier Dubois,et al.  Possibility theory in constraint satisfaction problems: Handling priority, preference and uncertainty , 1996, Applied Intelligence.

[39]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[40]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[41]  Griewank,et al.  On automatic differentiation , 1988 .

[42]  Didier Dubois,et al.  Fuzzy scheduling: Modelling flexible constraints vs. coping with incomplete knowledge , 2003, Eur. J. Oper. Res..

[43]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[44]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[45]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[46]  Camillo J. Taylor,et al.  View Synthesis with Occlusion Reasoning Using Quasi-Sparse Feature Correspondences , 2002, ECCV.

[47]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[48]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.