Fusing concept detection and geo context for visual search

Given the proliferation of geo-tagged images, the question of how to exploit geo tags and the underlying geo context for visual search is emerging. Based on the observation that the importance of geo context varies over concepts, we propose a concept-based image search engine which fuses visual concept detection and geo context in a concept-dependent manner. Compared to individual content-based and geo-based concept detectors and their uniform combination, concept-dependent fusion shows improvements. Moreover, since the proposed search engine is trained on social-tagged images alone without the need of human interaction, it is flexible to cope with many concepts. Search experiments on 101 popular visual concepts justify the viability of the proposed solution. In particular, for 79 out of the 101 concepts, the learned weights yield improvements over the uniform weights, with a relative gain of at least 5% in terms of average precision.

[1]  B. S. Manjunath,et al.  Global annotation on georeferenced photographs , 2009, CIVR '09.

[2]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[3]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[4]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[5]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[6]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[7]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  B. S. Manjunath,et al.  Spirittagger: a geo-aware tag suggestion tool mined from flickr , 2008, MIR '08.

[10]  Bruno Martins,et al.  Tag recommendation for georeferenced photos , 2011, LBSN '11.

[11]  Keiji Yanai,et al.  Geotagged iage rcognition by cmbining tree dfferent knds of golocation fatures , 2010, ACCV 2010.

[12]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[13]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[14]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[15]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[16]  Adrian Ulges,et al.  Relevance filtering meets active learning: improving web-based concept detectors , 2010, MIR '10.

[17]  Marcel Worring,et al.  Social negative bootstrapping for visual categorization , 2011, ICMR '11.

[18]  Gang Wang,et al.  On the sampling of web images for learning visual concept classifiers , 2010, CIVR '10.

[19]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[20]  Jiebo Luo,et al.  Leveraging probabilistic season and location context models for scene understanding , 2008, CIVR '08.

[21]  Keiji Yanai,et al.  Geotagged Image Recognition by Combining Three Different Kinds of Geolocation Features , 2010, ACCV.

[22]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[23]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[25]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[26]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[27]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[28]  Jiebo Luo,et al.  Inferring generic activities and events from image content and bags of geo-tags , 2008, CIVR '08.

[29]  John R. Smith,et al.  A framework for moderate vocabulary semantic visual concept detection , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[30]  Jiebo Luo,et al.  Enhancing semantic and geographic annotation of web images via logistic canonical correlation regression , 2009, ACM Multimedia.