ECL-LIRIS at TrecVid 2011: Semantic Indexing

This is the first time that our team participate TRECVID. This paper summarizes our approach submitted to Semantic Indexing (SIN) task in TRECVID 2011. Our approach adopts bag-of-features method to transform original visual and audio features into histogram features, using pre-trained codebook. After feature transformation, one-versus-others SVMs with Chi-square kernel are trained. In decision step, averaged probability is calculated as a final score to rank shots. Under this framework, we tested 4 visual features including dense grid SIFT, color SIFT, OLBPC and DAISY together with 1 audio feature consisting of MFCC with delta and acceleration. Our audio visual combination model achieves best results in terms of mean xinfAP. Besides, considering the huge amount of data this year, we employed several speedup strategies such as k-means clustering with GPU acceleration and homogeneous kernel map. All these efforts rank us at the 12 th out of 19 teams in full run and the 13 th out of 27 teams in the light run test.

[1]  Florian Metze,et al.  Informedia @ TRECVID 2011 , 2011 .

[2]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Chao Zhu,et al.  Visual object recognition using DAISY descriptor , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[4]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  Charles-Edmond Bichot,et al.  Color orthogonal local binary patterns combination for image region description ( Technical Report ) , 2011 .

[7]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[12]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[14]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..