Photo tourism: exploring photo collections in 3D

We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

[1]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[2]  Michael Bosse,et al.  Calibrated, Registered Images of an Extended Urban Area , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[7]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[8]  Daniel G. Aliaga,et al.  Sea of images , 2002 .

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[11]  Marc Levoy,et al.  Interactive design of multi-perspective images for visualizing urban landscapes , 2004, IEEE Visualization 2004.

[12]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[13]  Cordelia Schmid,et al.  Automatic line matching across views , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[15]  Manolis I. A. Lourakis,et al.  The design and implementation of a generic sparse bundle adjustment software package based on the Le , 2004 .

[16]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[17]  Roberto Cipolla,et al.  Modelling and Interpretation of Architecture from Several Images , 2004, International Journal of Computer Vision.

[18]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[19]  Leonard McMillan,et al.  Plenoptic Modeling: An Image-Based Rendering System , 2023 .

[20]  Andreas Girgensohn,et al.  Temporal event clustering for digital photo collections , 2003, ACM Multimedia.

[21]  Roberto Cipolla,et al.  Building Architectural Models from Many Views Using Map Constraints , 2002, ECCV.

[22]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[23]  Mor Naaman,et al.  From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates , 2003, OTM.

[24]  William G. Griswold,et al.  A systems architecture for ubiquitous video , 2005, MobiSys '05.

[25]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[26]  Daniel G. Aliaga,et al.  Interactive image-based rendering using feature globalization , 2003, I3D '03.

[27]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[28]  H. Garcia-Molina,et al.  Automatic organization for digital photographs with geographic coordinates , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[29]  Frank Dellaert,et al.  Spectral partitioning for structure from motion , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[31]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[32]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[33]  Reinhard Koch,et al.  Visual Modeling with a Hand-Held Camera , 2004, International Journal of Computer Vision.

[34]  Richard Szeliski,et al.  High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[35]  Katsumi Tanaka,et al.  3D viewpoint-based photo search and information browsing , 2005, SIGIR '05.

[36]  Steven K. Feiner,et al.  A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[37]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[38]  Andrew Lippman,et al.  Movie-maps: An application of the optical videodisc to computer graphics , 1980, SIGGRAPH '80.

[39]  Roberto Cipolla,et al.  A system for automatic pose-estimation from a single image in a city scene , 2002 .

[40]  Kentaro Toyama,et al.  Geographic location tags on digital images , 2003, ACM Multimedia.