论文信息 - Audio-visual event detection using duration dependent input output Markov models

Audio-visual event detection using duration dependent input output Markov models

Analysis of audio-visual data and detection of semantic events with spatio-temporal support is a challenging multimedia understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We introduce a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model non-exponential duration densities with the mapping of input sequences to output sequences. We test the DDIOMM by modelling the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.

Thomas S. Huang | M. R. Naphade | A. Garg

[1] Jay G. Wilpon,et al. Modeling state durations in hidden Markov models for automatic speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Takeo Kanade,et al. Semantic analysis for video contents extraction—spotting by association in news video , 1997, MULTIMEDIA '97.

[3] Vladimir Pavlovic,et al. Audio-visual speaker detection using dynamic Bayesian networks , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[4] Tsuhan Chen,et al. Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[5] Milind R. Naphade,et al. Stochastic modeling of soundtrack for efficient segmentation and indexing of video , 1999, Electronic Imaging.

[6] Milind R. Naphade,et al. Multimodal pattern matching for audio-visual query and retrieval , 2001, IS&T/SPIE Electronic Imaging.

[7] Brendan J. Frey,et al. Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[8] Milind R. Naphade,et al. Semantic video indexing using a probabilistic framework , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[9] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10] Alex Pentland,et al. Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Yoshua Bengio,et al. Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.