Learning concept bundles for video search with complex queries

Classifiers for primitive visual concepts like "car", "sky" have been well developed and widely used to support video search on simple queries. However, it is usually ineffective for complex queries like "one or more people at a table or desk with a computer visible", as they carry semantics far more complex and different from simply aggregating the meanings of their constituent primitive concepts. To facilitate video search of complex queries, we propose a higher-level semantic descriptor named "concept bundle", which integrates multiple primitive concepts, such as "(soccer, fighting)", "(lion, hunting, zebra)" etc, to describe the visual representation of the complex semantics. The proposed approach first automatically selects informative concept bundles. It then builds a novel concept bundle classifier based on multi-task learning by exploiting the relatedness between concept bundle and its primitive concepts. To model a complex query, it proposes an optimal selection strategy to select related primitive concepts and concept bundles by considering both their classifier performance and semantic relatedness with respect to the query. The final results are generated by fusing the individual results from these selected primitive concepts and concept bundles. Extensive experiments are conducted on two video datasets: TRECVID 2008 and YouTube datasets. The experimental results indicate that: (a) our concept bundle learning approach outperforms the state-of-the-art methods by at least 19% and 29% on TRECVID 2008 and YouTube datasets, respectively; and (b) the use of concept bundles can improve the search performance for complex queries by at least 37.5% on TRECVID 2008 and 52% on YouTube datasets.

[1]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[2]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[3]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[5]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[8]  Qiang Yang,et al.  Transferring Multi-device Localization Models using Latent Multi-task Learning , 2008, AAAI.

[9]  Chong-Wah Ngo,et al.  Fusing semantics, observability, reliability and diversity of concept detectors for video search , 2008, ACM Multimedia.

[10]  Ivor W. Tsang,et al.  Domain Transfer SVM for video concept detection , 2009, CVPR 2009.

[11]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[13]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[17]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Meng Wang,et al.  MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search , 2008, TRECVID.

[19]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[20]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[21]  Dong Wang,et al.  The importance of query-concept-mapping for automatic video retrieval , 2007, ACM Multimedia.

[22]  Chong-Wah Ngo,et al.  Semantic context transfer across heterogeneous sources for domain adaptive video search , 2009, ACM Multimedia.

[23]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[24]  Sheng Tang,et al.  TRECVID 2008 Participation by MCG-ICT-CAS , 2008, TRECVID.

[25]  Wei-Hao Lin,et al.  Assessing Effectiveness in Video Retrieval , 2005, CIVR.

[26]  Chong-Wah Ngo,et al.  Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces , 2008, IEEE Transactions on Multimedia.

[27]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[28]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[29]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[30]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[31]  Huan Liu,et al.  Enhancing accessibility of microblogging messages using semantic knowledge , 2011, CIKM '11.

[32]  M. Shyu,et al.  Florida International University and University of Miami TRECVID 2008 - High Level Feature Extraction , 2008, TRECVID.

[33]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[34]  Djoerd Hiemstra,et al.  Building Detectors to Support Searches on Combined Semantic Concepts , 2007 .

[35]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[36]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[37]  Shih-Fu Chang,et al.  Semantic Concept Classification by Joint Semi-supervised Learning of Feature Subspaces and Support Vector Machines , 2008, ECCV.

[38]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[39]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[40]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.