Structure analysis of soccer video with domain knowledge and hidden Markov models

In this paper, we present statistical techniques for parsing the structure of produced soccer programs. The problem is important for applications such as personalized video streaming and browsing systems, in which videos are segmented into different states and important states are selected based on user preferences. While prior work focuses on the detection of special events such as goals or corner kicks, this paper is concerned with generic structural elements of the game. We define two mutually exclusive states of the game, play and break based on the rules of soccer. Automatic detection of such generic states represents an original challenging issue due to high appearance diversities and temporal dynamics of such states in different videos. We select a salient feature set from the compressed domain, dominant color ratio and motion intensity, based on the special syntax and content characteristics of soccer videos. We then model the stochastic structures of each state of the game with a set of hidden Markov models. Finally, higher-level transitions are taken into account and dynamic programming techniques are used to obtain the maximum likelihood segmentation of the video sequence. The system achieves a promising classification accuracy of 83.5%, with light-weight computation on feature extraction and model inference, as well as a satisfactory accuracy in boundary timing.

[1]  Boon-Lock Yeo,et al.  Analysis And Presentation Of Soccer Highlights From Digital Video , 1995 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Shih-Fu Chang,et al.  Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[4]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[5]  Jay G. Wilpon,et al.  Modeling state durations in hidden Markov models for automatic speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Shih-Fu Chang,et al.  Structure analysis of soccer video with hidden Markov models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Shih-Fu Chang,et al.  Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8]  John Larson,et al.  Television Field Production and Reporting , 1988 .

[9]  Anil K. Jain,et al.  Automatic classification of tennis video for high-level content-based retrieval , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[10]  Richard J. Qian,et al.  Detecting semantic events in soccer games: towards a complete solution , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[11]  Shih-Fu Chang,et al.  Structure analysis of sports video using domain models , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[12]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..