Multimodal Segmental-Based Modeling of Tennis Video Broadcasts

Efficient multimodal fusion is a key feature of future video indexing systems. Hidden Markov models provide a powerful framework for video structure analysis but they require all video modalities to be strictly synchronous. Taking as a case study tennis broadcasts analysis, we introduce into video indexing segment models, a generalization of hidden Markov Models, where the fusion of different modalities can be performed with relaxed synchrony constraints. Segment models were experimentally proved to perform marginally better compared to hidden Markov models

[1]  Guillaume Gravier,et al.  Multiple events tracking in sound tracks , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[2]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[3]  Ba Tu Truong,et al.  New enhancements to cut, fade, and dissolve detection processes in video segmentation , 2000, ACM Multimedia.

[4]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Patrick Gros,et al.  HMM based structuring of tennis videos using visual and audio cues , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[7]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.