Saliency-GD: A TF-IDF Analogy for Landmark Image Mining

In this paper we address the problem of unsupervised landmark mining, which is to automatically discover frequently appearing landmarks from an unstructured image dataset. Landmark mining often suffers from false matches resulted from cluttered backgrounds and foregrounds, inter-class similarities, and so on. Analogous to TF-IDF in image retrieval, we propose the Saliency-GD weighting scheme of visual words, which can be easily integrated into state-of-the-art local-feature-based visual instance mining frameworks. Saliency detection provides feature weighting in image space from the attention perspective, and in feature space, the knowledge of geographic density (GD) transferred from a separate training dataset gives a multimodal selection of meaningful visual words. Experiments on public landmark datasets show that Saliency-GD weighting scheme greatly improves the landmark mining performance with increasing discrimination power of visual features.

[1]  Zhu Zhu,et al.  Organizing photographs with geospatial and image semantics , 2017, Multimedia Systems.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[4]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[5]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[6]  Chong-Wah Ngo,et al.  Scalable Visual Instance Mining with Threads of Features , 2014, ACM Multimedia.

[7]  Hongzhi Li,et al.  Multimodal Visual Pattern Mining with Convolutional Neural Networks , 2016, ICMR.

[8]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[9]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[10]  Shi-Min Hu,et al.  Data‐Driven Object Manipulation in Images , 2012, Comput. Graph. Forum.

[11]  Heng Ji,et al.  Event Specific Multimodal Pattern Mining for Knowledge Base Construction , 2016, ACM Multimedia.

[12]  Wei Li,et al.  Scalable Visual Instance Mining with Instance Graph , 2015, BMVC.

[13]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, CVPR.

[14]  Jiri Matas,et al.  Large-Scale Discovery of Spatially Related Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[16]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[17]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[21]  Bart Thomee,et al.  Working Notes for the Placing Task at MediaEval 2013 , 2013, MediaEval.

[22]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[23]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.