Exploitation and Exploration Balanced Hierarchical Summary for Landmark Images

While we have made significant progress over image understanding and search, how to meet the ultimate goal of satisfying both exploration and exploitation in one single system is still an open challenge. In the context of landmark images, it means that a system should not only be able to help users to quickly locate the photo they are interested in (exploitation), but also to discover different parts of the landmark which have never been seen before (exploration), which is a common request as evidenced by many recent multimedia studies. To the best of our knowledge, existing systems mainly focus on either exploration (e.g., photo browsing) or exploitation (e.g., representative photo identification), while users' need of exploration and exploitation is dynamically mixed. In this paper, we tackle the challenge by organizing landmark images into a hierarchical summary which gives user the flexibility of conducting both exploration and exploitation. In the hierarchical summary construction, we introduce two principles: the coherence principle and the diversity principle. Behind these two principles, the intrinsic concept is “detail-level,” which measures how much detail that an image reflects for a certain landmark. A new objective function is derived from the definition of both exploration and exploitation experience on detail-level. The problem of finding an optimal hierarchical summary is formulated as searching over a space of trees for the one that achieves the best objective score. Extensive quantitative experimental results and comprehensive user studies show that the optimized hierarchical summary is able to satisfy both experiences simultaneously.

[1]  François Fleuret,et al.  Iterative relevance feedback with adaptive exploration/exploitation trade-off , 2012, CIKM '12.

[2]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Stevan Rudinac,et al.  Generating Visual Summaries of Geographic Areas Using Community-Contributed Images , 2013, IEEE Transactions on Multimedia.

[4]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[5]  Roelof van Zwol,et al.  Faceted exploration of image search results , 2010, WWW '10.

[6]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[7]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[8]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[9]  Changsheng Xu,et al.  Mobile Landmark Search with 3D Models , 2014, IEEE Transactions on Multimedia.

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Jiri Matas,et al.  Optimal Randomized RANSAC , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[15]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[16]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[17]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[18]  Feng Wu,et al.  3D visual phrases for landmark recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[20]  Jia Chen,et al.  DLMSearch: diversified landmark search by photo , 2012, ACM Multimedia.

[21]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[22]  Gene H. Golub,et al.  Matrix computations , 1983 .

[23]  David Martin,et al.  Street View Motion-from-Structure-from-Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Yang Song,et al.  Tour the world: a technical demonstration of a web-scale landmark recognition engine , 2009, ACM Multimedia.

[25]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[26]  Nenghai Yu,et al.  Optimized Distances for Binary Code Ranking , 2014, ACM Multimedia.

[27]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[28]  Tao Chen,et al.  Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition , 2014, IEEE Transactions on Multimedia.

[29]  Heikki Mannila,et al.  Measures of Presortedness and Optimal Sorting Algorithms , 1985, IEEE Transactions on Computers.

[30]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[31]  Tao Mei,et al.  Learning to personalize trending image search suggestion , 2014, SIGIR.

[32]  Yao Hu,et al.  Iterative Multi-View Hashing for Cross Media Indexing , 2014, ACM Multimedia.

[33]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[36]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.