Coherent image layout using an adaptive visual vocabulary

When querying a huge image database containing millions of images, the result of the query may still contain many thousands of images that need to be presented to the user. We consider the problem of arranging such a large set of images into a visually coherent layout, one that places similar images next to each other. Image similarity is determined using a bag-of-features model, and the layout is constructed from a hierarchical clustering of the image set by mapping an in-order traversal of the hierarchy tree into a space-filling curve. This layout method provides strong locality guarantees so we are able to quantitatively evaluate performance using standard image retrieval benchmarks. Performance of the bag-of-features method is best when the vocabulary is learned on the image set being clustered. Because learning a large, discriminative vocabulary is a computationally demanding task, we present a novel method for efficiently adapting a generic visual vocabulary to a particular dataset. We evaluate our clustering and vocabulary adaptation methods on a variety of image datasets and show that adapting a generic vocabulary to a particular set of images improves performance on both hierarchical clustering and image retrieval tasks.

[1]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[3]  Jianping Fan,et al.  Semantic Image Browser: Bridging Information Visualization with Automated Intelligent Image Analysis , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[4]  Martin Wattenberg A note on space-filling visualizations and space-filling curves , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[5]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Benjamin B. Bederson,et al.  PhotoMesa: a zoomable image browser using quantum treemaps and bubblemaps , 2001, UIST '01.

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[9]  John C. Dalton,et al.  Hierarchical browsing and search of large image databases , 2000, IEEE Trans. Image Process..

[10]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[11]  Guillaume Pitel,et al.  Image clustering based on a shared nearest neighbors approach for tagged collections , 2008, CIVR '08.

[12]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[14]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Kerry Rodden,et al.  Evaluating a visualisation of image similarity as a tool for image browsing , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[16]  Leonidas J. Guibas,et al.  Image webs: Computing and exploiting connectivity in image collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[18]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[20]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[21]  Gerald Schaefer,et al.  Image Database Navigation on a Hierarchical MDS Grid , 2006, DAGM-Symposium.

[22]  Stephen J. McKenna,et al.  Visualizing Image Collections Using High-Entropy Layout Distributions , 2010, IEEE Transactions on Multimedia.

[23]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Andrew Zisserman,et al.  Object Mining Using a Matching Graph on Very Large Image Collections , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[26]  Qi Tian,et al.  Visualization and User-Modeling for Browsing Personal Photo Libraries , 2004, International Journal of Computer Vision.

[27]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[28]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[29]  Patrick Baudisch,et al.  Time quilt: scaling up zoomable photo browsers for large, unstructured photo collections , 2005, CHI EA '05.

[30]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[31]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[32]  Rolf Niedermeier,et al.  Towards optimal locality in mesh-indexings , 1997, Discret. Appl. Math..