Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models

Structure elements in a time sequence (e.g. video) are repetitive segments with consistent deterministic or stochastic characteristics. While most existing work in detecting structures follows a supervised paradigm, we propose a fully unsupervised statistical solution in this paper. We present a unified approach to structure discovery from long video sequences as simultaneously finding the statistical descriptions of structure and locating segments that matches the descriptions. We model the multilevel statistical structure as hierarchical hidden Markov models, and present efficient algorithms for learning both the parameters and the model structure. When tested on a specific domain, soccer video, the unsupervised learning scheme achieves very promising results: it automatically discovers the statistical descriptions of high-level structures, and at the same time achieves even slightly better accuracy in detecting discovered structures in unlabelled videos than a supervised approach designed with domain knowledge and trained with comparable hidden Markov models.

[1]  Milind R. Naphade,et al.  Discovering recurrent events in video using unsupervised methods , 2002, Proceedings. International Conference on Image Processing.

[2]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[3]  Nando de Freitas,et al.  Robust Full Bayesian Learning for Radial Basis Networks , 2001, Neural Computation.

[4]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[5]  Alex Pentland,et al.  Unsupervised clustering of ambulatory audio and video , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[10]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[11]  Shih-Fu Chang,et al.  Learning Hierarchical Hidden Markov Models for Video Structure Discovery , 2003 .