Discovering Details and Scene Structure with Hierarchical Iconoid Shift

Current landmark recognition engines are typically aimed at recognizing building-scale landmarks, but miss interesting details like portals, statues or windows. This is because they use a flat clustering that summarizes all photos of a building facade in one cluster. We propose Hierarchical Iconoid Shift, a novel landmark clustering algorithm capable of discovering such details. Instead of just a collection of clusters, the output of HIS is a set of dendrograms describing the detail hierarchy of a landmark. HIS is based on the novel Hierarchical Medoid Shift clustering algorithm that performs a continuous mode search over the complete scale space. HMS is completely parameter-free, has the same complexity as Medoid Shift and is easy to parallelize. We evaluate HIS on 800k images of 34 landmarks and show that it can extract an often surprising amount of detail and structure that can be applied, e.g., to provide a mobile user with more detailed information on a landmark or even to extend the landmark's Wikipedia article.

[1]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[2]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Daniel DeMenthon,et al.  SPATIO-TEMPORAL SEGMENTATION OF VIDEO BY HIERARCHICAL MEAN SHIFT ANALYSIS , 2002 .

[4]  Yonatan Wexler,et al.  Hierarchical photo organization using geo-relevance , 2007, GIS.

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.

[7]  Steven M. Seitz,et al.  Scene Segmentation Using the Wisdom of Crowds , 2008, ECCV.

[8]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[9]  Bastian Leibe,et al.  Discovering favorite views of popular places with iconoid shift , 2011, 2011 International Conference on Computer Vision.

[10]  Francis Y. L. Chin,et al.  Algorithms for Updating Minimal Spanning Trees , 1978, J. Comput. Syst. Sci..

[11]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[13]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Dorin Comaniciu,et al.  The Variable Bandwidth Mean Shift and Data-Driven Scale Selection , 2001, ICCV.

[15]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Nassir Navab,et al.  Region Graphs for Organizing Image Collections , 2010, ECCV Workshops.

[17]  Andrew P. Witkin,et al.  Scale-space filtering: A new approach to multi-scale description , 1984, ICASSP.

[18]  Takeo Kanade,et al.  Mode-seeking by Medoidshifts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[20]  Weng-Keen Wong,et al.  Category detection using hierarchical mean shift , 2009, KDD.

[21]  Luke S. Zettlemoyer,et al.  3D Wikipedia , 2013, ACM Trans. Graph..

[22]  Andrew Zisserman,et al.  Object Mining Using a Matching Graph on Very Large Image Collections , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[23]  Torsten Sattler,et al.  SCRAMSAC: Improving RANSAC's efficiency with a spatial consistency filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Luc Van Gool,et al.  Size Does Matter: Improving Object Recognition and 3D Reconstruction with Cross-Media Analysis of Image Clusters , 2010, ECCV.

[25]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[26]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..