LIG at MediaEval 2013 Affect Task: Use of a Generic Method and Joint Audio-Visual Words

This paper describes the LIG participation to the MediaEval 2013 Affect Task on violent scenes detection in Hollywood movies. We submitted four runs at the shot level for each subtasks: objective violent scenes detection and subjective violent scenes detection. Our four runs are: hierarchical fusion of descriptors and classifier combinations, the same with joint audio-visual words, and the same two with reranking. Our reference run obtained with the official MAP@100 metric a performance of 69% for the subjective violence and 52% for the objective violence. The joint audio-visual words bring a slight improvement on the MAP@100 and they improve the precision in the head of the returned list while the temporal re-ranking improves the P@100.

[1]  Dong Liu,et al.  Joint audio-visual bi-modal codewords for video event detection , 2012, ICMR.

[2]  Arnaldo de Albuquerque Araújo,et al.  Violence Detection in Video Using Spatio-Temporal Features , 2010, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images.

[3]  Georges Quénot,et al.  Re-ranking for Multimedia Indexing and Retrieval , 2011, ECIR.

[4]  Markus Schedl,et al.  The MediaEval 2013 Affect Task: Violent Scenes Detection , 2013, MediaEval.

[5]  Georges Quénot,et al.  LIG at MediaEval 2012 affect task: use of a generic method , 2011, MediaEval.

[6]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[7]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.