View Clustering of Wide-Baseline N-views for Photo Tourism

The problem of view clustering is concerned with finding connected sets of overlapping views in a collection of photographs. The view clusters can be used to organize a photo collection, traverse through a collection, or for 3D structure estimation. For large datasets, geometric matching of all image pairs via pose estimation to decide on content overlap is not viable. The problem becomes even more acute if the views in the collection are separated by wide baselines, i.e. we do not have a dense view sampling of the 3D scene that leads to increase in computational cost of epipolar geometry estimation and matching. We propose an efficient algorithm for clustering of such many weakly overlapping views, based on opportunistic use of epipolar geometry estimation for only a limited number of image pairs. We cast the problem of view clustering as finding a tree structure graph over the views, whose weighted links denote likelihood of view overlap. The optimization is done in an iterative fashion starting from an minimum spanning tree based on photometric distances between image pairs. At each iteration step, we rule out edges with low confidence of overlap between the respective views, based on epipolar geometry estimates. The minimum spanning tree is recomputed and the process is repeated until there is no further change in the link structure. We show results on the images in the 2010 Nokia Grand Challenge Dataset that contains images with low overlap with each other.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[3]  Sudeep Sarkar,et al.  Fast detection of noisy GPS and magnetometer tags in wide-baseline multi-views , 2011, MM '11.

[4]  Yang Song,et al.  Tour the world: a technical demonstration of a web-scale landmark recognition engine , 2009, ACM Multimedia.

[5]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[7]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[9]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[11]  Sudeep Sarkar,et al.  BLOGS: Balanced local and global search for non-degenerate two view epipolar geometry , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Philip H. S. Torr,et al.  Bayesian Model Estimation and Selection for Epipolar Geometry and Generic Manifold Fitting , 2002, International Journal of Computer Vision.

[14]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[15]  Noah Snavely Photo Tourism : Exploring image collections in 3D , 2006 .

[16]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Haifeng Chen,et al.  Robust regression with projection based M-estimators , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Philip H. S. Torr,et al.  The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix , 1997, International Journal of Computer Vision.

[20]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[22]  育久 満上,et al.  Bundler: Structure from Motion for Unordered Image Collections , 2011 .

[23]  Leonidas J. Guibas,et al.  Image webs: Computing and exploiting connectivity in image collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  A. Welsh On $M$-Processes and $M$-Estimation , 1989 .