LIG at TRECVid 2015: Semantic Indexing

LIG participated to the semantic indexing main task. LIG also participated to the organization of this task. This paper describes these participations which are quite similar to our previous year's participations (within the Quaero consortium). Our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We used a number of different descriptors and a hierarchical fusion strategy. We also used conceptual feedback by adding a vector of classification score to the pool of descriptors. The main innovation this year consisted in the inclusion of semantic descriptors computed using a deep learning method. We also used multiple frames for some features and this did lead to a significant improvement. The best LIG run has a Mean Inferred Average Precision of 0.2935, which ranked it 5$^{th}$ out of 15 participants.

[1]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[3]  Emmanuel Dellandréa,et al.  IRIM at TRECVID 2015: Semantic Indexing , 2015, TRECVID.

[4]  Marcel Worring,et al.  On the surplus value of semantic video analysis beyond the key frame , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[5]  Stéphane Ayache,et al.  IRIM at TRECVID 2010: High Level Feature Extraction and Instance Search , 2010 .

[6]  Georges Quénot,et al.  Conceptual feedback for semantic multimedia indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[7]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[8]  Georges Quénot,et al.  Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[9]  Stéphane Ayache,et al.  Image and Video Indexing Using Networks of Operators , 2007, EURASIP J. Image Video Process..

[10]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[11]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[12]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[13]  Koen E. A. van de Sande,et al.  A comparison of color features for visual concept classification , 2008, CIVR '08.

[14]  Miriam Redi,et al.  EURECOM at TrecVid 2011: The Light Semantic Indexing Task , 2011, TRECVID.

[15]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[16]  Stéphane Ayache,et al.  Active Cleaning for Video Corpus Annotation , 2012, MMM.

[17]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[18]  Stéphane Ayache,et al.  Using Topic Concepts for Semantic Video Shots Classification , 2006, CIVR.

[19]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[20]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.