Optimal Reduction of Large Image Databases for Location Recognition

For some computer vision tasks, such as location recognition on mobile devices or Structure from Motion (SfM) computation from Internet photo collections, one wants to reduce a large set of images to a compact, representative subset, sometimes called ``key frames'' or ``skeletal set''. We examine the problem of selecting a minimum set of such key frames from the point of view of discrete optimization, as the search for a minimum connected dominating set (CDS) of the graph of pair wise connections between the database images. Even the simple minimum dominating set (DS) problem is known to be NP-hard, and the constraint that the dominating set should be connected makes it even harder. We show how the minimum DS can nevertheless be solved to global optimality efficiently in practice, by formulating it as an integer linear program (ILP). Furthermore, we show how to upgrade the solution to a connected dominating set with a second ILP if necessary, although the complete method is no longer globally optimal. We also compare the proposed method to a previous greedy heuristic. Experiments with several image sets show that the greedy solution already performs remarkably well, and that the optimal solution achieves roughly 5% smaller key frame sets which perform equally well in location recognition and SfM tasks.

[1]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[2]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[4]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[5]  Michal Havlena,et al.  Omnidirectional Image Stabilization for Visual Object Recognition , 2010, International Journal of Computer Vision.

[6]  Samir Khuller,et al.  Approximation Algorithms for Connected Dominating Sets , 1996, Algorithmica.

[7]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[10]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Zuzana Kukelova,et al.  New Efficient Solution to the Absolute Pose Problem for Camera with Unknown Focal Length and Radial Distortion , 2010, ACCV.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[14]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[15]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[16]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Michal Havlena,et al.  Efficient Structure from Motion by Graph Optimization , 2010, ECCV.

[18]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[19]  Riad I. Hammoud,et al.  Overhead-Based Image and Video Geo-localization Framework , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[21]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[22]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[24]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[25]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[26]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[27]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[28]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[29]  Roger Wattenhofer,et al.  Constant-time distributed dominating set approximation , 2003, PODC '03.