Audio-Visual Foreground Extraction for Event Characterization

This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities.

[1]  Bir Bhanu,et al.  Tracking Humans using Multi-modal Fusion , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[2]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[3]  K. Wilson,et al.  Person Tracking Using Audio-Video Sensor Fusion , 2001 .

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Manuele Bicego,et al.  On-line adaptive background modelling for audio surveillance , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Willem Jonker,et al.  Content-Based Video Retrieval - A Database Perspective , 2003, Multimedia systems and applications.

[7]  Trevor Darrell,et al.  Geometric and Statistical Approaches to Audiovisual Segmentation , 2005 .

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  Jim Kay,et al.  Feature discovery under contextual supervision using mutual information , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[11]  Malcolm Slaney,et al.  FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[12]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).