Infrequent concept pairs detection in multimedia documents

Single visual concept detection in videos is a hard task, especially for infrequent concepts or for those difficult to model. This question becomes even more difficult in the case of concept pairs. Two main directions may tackle this problem: 1) combine the predictions of their corresponding detectors in a way which is similar to usual information retrieval, or 2) build supervised learners for these pairs of concepts by generating annotations based on the occurrences of the two individual concepts. Each of these approaches have advantages and drawbacks. We evaluated them in the context of the concept pair detection subtask of the TRECVid 2013 semantic indexing (SIN) task and found that information retrieval-like fusions of concept detection scores outperforms the learning approaches. The described methods outperform the best official result of the evaluation campaign cited previously, by 9% in terms of relative improvement on MAP.

[1]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[2]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[3]  Djoerd Hiemstra,et al.  A probabilistic ranking framework using unobservable binary events for video search , 2008, CIVR '08.

[4]  Chong-Wah Ngo,et al.  Concept-Driven Multi-Modality Fusion for Video Search , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[6]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[7]  Edward A. Fox,et al.  Research Contributions , 2014 .

[8]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[9]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[11]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[12]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[13]  Alan F. Smeaton Independence of Contributing Retrieval Strategies in Data Fusion for Effective Information Retrieval , 1998, BCS-IRSG Annual Colloquium on IR Research.

[14]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .