Conceptual feedback for semantic multimedia indexing

In this paper, we consider the problem of automatically detecting a large number of visual concepts in images or video shots. State of the art systems involve feature (descriptor) extraction, classification (supervised learning) and fusion when several descriptors and/or classifiers are used. Though direct multi-label approaches are considered in some works, detection scores are often computed independently for each target concept. We propose here a method that we call “conceptual feedback” for improving the overall detection performance that implicitly takes into account the relations between concepts. The vector of normalized detection scores is added to the pool of available descriptors. It is then processed just as the other descriptors for the normalization, optimization and classification steps. The resulting detection scores are finally fused with the already available detection scores obtained with the original descriptors. The feedback of the global detection scores in the pool of descriptors can be iterated several times. It is also compatible with the use of the temporal context that also improves the overall performance by taking into account the local homogeneity of video contents. The method has been evaluated in the context of the TRECVID 2012 semantic indexing task involving the detection of 346 visual or multimodal concepts. Combined with temporal re-scoring, the proposed method increased the global system performance (MAP) from 0.2613 to 0.3014 (+15.3% of relative improvement) while the temporal re-scoring alone increased it only from 0.2613 to 0.2691 (+3.0%).

[1]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[2]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[3]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[4]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[5]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Yung-Yu Chuang,et al.  Cross-Domain Multicue Fusion for Concept-Based Video Indexing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[9]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[10]  Meng Wang,et al.  Correlative multilabel video annotation with temporal kernels , 2008, TOMCCAP.

[11]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[12]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.

[13]  Georges Quénot,et al.  Re-ranking for Multimedia Indexing and Retrieval , 2011, ECIR.

[14]  Georges Quénot,et al.  Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[15]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[16]  Georges Quénot,et al.  Hierarchical Late Fusion for Concept Detection in Videos , 2014, Fusion in Computer Vision.

[17]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[18]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.