CEA LIST at TRECVID 2012 : Semantic Indexing and instance search

This paper reports the experiments carried out for the semantic indexing (SIN) and the instance search (INS) tasks at TRECVID 2012. For the SIN task, we evaluated two recently proposed features with a simple one-versus-all linear SVM framework. the rst one is a motion histogram based on trajectory vectors. The second one is a bag-of-visterm that take into account the spatial consistency of descriptors. In the INS task, we proposed a descriptors based on local descriptor matching able to scale to the considered corpus. A second contribution for INS consisted in studying several late fusion schemes. Preliminary experiments were conducted on the INS 11 corpus to choose the best strategy, leading to results in the top 5% of past results. While these preliminary results were very promising, 2012 results are above the median of participating runs, but far from reproducing previous year performances. The signicance of the results is thus studied, showing that signicant dif

[1]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[2]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[3]  Nicolas Ballas,et al.  Trajectories based descriptor for dynamic events annotation , 2011, J-MRE '11.

[4]  Romaric Besançon,et al.  Data Fusion of Retrieval Results from Different Media: Experiments at ImageCLEF 2005 , 2005, CLEF.

[5]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[9]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[12]  Tieniu Tan,et al.  Salient coding for image classification , 2011, CVPR 2011.

[13]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[14]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[15]  Mario A. Nascimento,et al.  A compact and efficient image retrieval approach based on border/interior pixel classification , 2002, CIKM '02.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[18]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[19]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[20]  Hervé Le Borgne,et al.  Locality-constrained and spatially regularized coding for scene categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .