Photo tourism: exploring photo collections in 3D

We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

[1]  Andrew Lippman,et al.  Movie-maps: An application of the optical videodisc to computer graphics , 1980, SIGGRAPH '80.

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[4]  Leonard McMillan,et al.  Plenoptic Modeling: An Image-Based Rendering System , 2023 .

[5]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[6]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[7]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[8]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[9]  Cordelia Schmid,et al.  Automatic line matching across views , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Steven K. Feiner,et al.  A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[11]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[12]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[13]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[14]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[15]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[16]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[17]  Daniel G. Aliaga,et al.  Sea of images , 2002, IEEE Visualization, 2002. VIS 2002..

[18]  Roberto Cipolla,et al.  Building Architectural Models from Many Views Using Map Constraints , 2002, ECCV.

[19]  Roberto Cipolla,et al.  A system for automatic pose-estimation from a single image in a city scene , 2002 .

[20]  Daniel G. Aliaga,et al.  Interactive image-based rendering using feature globalization , 2003, I3D '03.

[21]  Kentaro Toyama,et al.  Geographic location tags on digital images , 2003, ACM Multimedia.

[22]  Frank Dellaert,et al.  Spectral partitioning for structure from motion , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Andreas Girgensohn,et al.  Temporal event clustering for digital photo collections , 2003, ACM Multimedia.

[24]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[26]  Mor Naaman,et al.  From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates , 2003, OTM.

[27]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[28]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[29]  Mor Naaman,et al.  Automatic organization for digital photographs with geographic coordinates , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[30]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[31]  Michael Bosse,et al.  Calibrated, Registered Images of an Extended Urban Area , 2003, International Journal of Computer Vision.

[32]  Marc Levoy,et al.  Interactive design of multi-perspective images for visualizing urban landscapes , 2004, IEEE Visualization 2004.

[33]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[34]  Roberto Cipolla,et al.  Modelling and Interpretation of Architecture from Several Images , 2004, International Journal of Computer Vision.

[35]  William G. Griswold,et al.  A systems architecture for ubiquitous video , 2005, MobiSys '05.

[36]  L. Chew Constrained Delaunay triangulations , 1987, SCG '87.

[37]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[38]  Marc Pollefeys,et al.  Multiple view geometry , 2005 .

[39]  Katsumi Tanaka,et al.  3D viewpoint-based photo search and information browsing , 2005, SIGIR '05.

[40]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[41]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.