Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset)

We propose a novel, large-scale, structure-from-motion framework that advances the state of the art in data scalability from city-scale modeling (millions of images) to world-scale modeling (several tens of millions of images) using just a single computer. The main enabling technology is the use of a streaming-based framework for connected component discovery. Moreover, our system employs an adaptive, online, iconic image clustering approach based on an augmented bag-of-words representation, in order to balance the goals of registration, comprehensiveness, and data compactness. We demonstrate our proposal by operating on a recent publicly available 100 million image crowdsourced photo collection containing images geographically distributed throughout the entire world. Results illustrate that our streaming-based approach does not compromise model completeness, but achieves unprecedented levels of efficiency and scalability.

[1]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[2]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[4]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Leonidas J. Guibas,et al.  Image webs: Computing and exploiting connectivity in image collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[8]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[9]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[10]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Konrad Schindler,et al.  VocMatch: Efficient Multiview Correspondence for Structure from Motion , 2014, ECCV.

[12]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[13]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[14]  Jan-Michael Frahm,et al.  PAIGE: PAirwise Image Geometry Encoding for improved efficiency in Structure-from-Motion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[16]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[17]  Johannes Gehrke,et al.  MatchMiner: Efficient Spanning Structure Mining in Large Image Collections , 2012, ECCV.

[18]  Jan-Michael Frahm,et al.  Improved Geometric Verification for Large Scale Landmark Image Collections , 2012, BMVC.

[19]  David Martin,et al.  Street View Motion-from-Structure-from-Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[21]  Horst Bischof,et al.  Towards Wiki-based Dense City Modeling , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Jan-Michael Frahm,et al.  Correcting for Duplicate Scene Structure in Sparse 3D Reconstruction , 2014, ECCV.

[23]  Jan-Michael Frahm,et al.  From single image query to detailed 3D reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Martin Isenburg,et al.  Streaming meshes , 2005, VIS 05. IEEE Visualization, 2005..

[25]  David Nister,et al.  Recent developments on direct relative orientation , 2006 .

[26]  Jan-Michael Frahm,et al.  A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus , 2008, ECCV.

[27]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[28]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.