Mots audio-visuels joints pour la détection de scènes violentes dans les vidéos

Ce papier presente une representation audio-visuelle des donnees pour la detection des scenes violentes dans les films. Les travaux existants dans ce domaine considerent l'infor- mation visuelle ou l'information audio; voire leur fusion classique. Jusqu'a present peu d'ap- proches ont explore leur dependance mutuelle pour la detection de scenes violentes. Ainsi, nous proposons un descripteur qui fournit des indices multimodaux audio et visuels; tout d'abord en assemblant les descripteurs audio et visuels, ensuite en revelant statistiquement les motifs conjoints multimodaux. La validation experimentale a ete effectuee dans le cadre de la tâche "detection de scenes violentes" de MediaEval 2013. Les resultats obtenus montrent le potentiel de l'approche proposee en comparaison avec les methodes utilisant les descripteurs audio et visuels separement ou d'autres types de fusion.

[1]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Sergios Theodoridis,et al.  Violence Content Classification Using Audio Features , 2006, SETN.

[3]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[4]  Denyse Baillargeon,et al.  Bibliographie , 1929 .

[5]  Markus Schedl,et al.  The MediaEval 2013 Affect Task: Violent Scenes Detection , 2013, MediaEval.

[6]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[8]  Moni Naor,et al.  Computer Analysis of Images and Patterns , 1989, Lecture Notes in Computer Science.

[9]  Georges Quénot,et al.  LIG at MediaEval 2013 Affect Task: Use of a Generic Method and Joint Audio-Visual Words , 2013, MediaEval.

[10]  Sergios Theodoridis,et al.  Audio-Visual Fusion for Detecting Violent Scenes in Videos , 2010, SETN.

[11]  Weiqiang Wang,et al.  Weakly-Supervised Violence Detection in Movies with Audio and Video Based Co-training , 2009, PCM.

[12]  Manuele Bicego,et al.  Audio-Visual Event Recognition in Surveillance Video Sequences , 2007, IEEE Transactions on Multimedia.

[13]  Bernd Freisleben,et al.  Multimodal Video Concept Detection via Bag of Auditory Words and Multiple Kernel Learning , 2012, MMM.

[14]  Arnaldo de Albuquerque Araújo,et al.  Violence Detection in Video Using Spatio-Temporal Features , 2010, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images.

[15]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[16]  Dong Liu,et al.  Joint audio-visual bi-modal codewords for video event detection , 2012, ICMR.

[17]  Georges Quénot,et al.  LIG at MediaEval 2012 affect task: use of a generic method , 2011, MediaEval.

[18]  Rahul Sukthankar,et al.  Violence Detection in Video Using Computer Vision Techniques , 2011, CAIP.

[19]  Alexander C. Loui,et al.  Audio-visual grouplet: temporal audio-visual interactions for general video concept classification , 2011, ACM Multimedia.

[20]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[22]  Patrick Gros,et al.  Audio event detection in movies using multiple audio words and contextual Bayesian networks , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[23]  Wen Gao,et al.  Detecting Violent Scenes in Movies by Auditory and Visual Cues , 2008, PCM.