This paper proposes a unified method to deal with the problem of detecting cheering events in audio stream of live sports games. In our framework, first, a sliding window is used to pre-segment the audio stream into short segments by moving from start to the end. Second, various kinds of audio features are extracted to represent different audio sounds in each segment. Third, GMM (Gaussian Mixture Model) is used as the classifier to detect cheering events. Finally, in addition to widely used smoothing rules, this paper developed a new boundary-seeking smoothing algorithm to overcome the shortcomings of conventional sliding-window based analysis method and eliminate the false alarms caused by background noise. By integrating all the techniques, an average F value of 82.99% is achieved in the cheering detection task evaluated on eleven games of five kinds of sports. In this study, we discuss the complementarity of various kinds of audio features for the cheering event detection task. We also compare the result with the HMM based event detection framework. Based on our study, we conclude that for long-term audio event detection such as cheering event detection, sliding-window based framework gives more satisfied result.
[1]
H Hermansky,et al.
Perceptual linear predictive (PLP) analysis of speech.
,
1990,
The Journal of the Acoustical Society of America.
[2]
Douglas A. Reynolds,et al.
Robust text-independent speaker identification using Gaussian mixture speaker models
,
1995,
IEEE Trans. Speech Audio Process..
[3]
Joemon M. Jose,et al.
Audio-Based Event Detection for Sports Video
,
2003,
CIVR.
[4]
Regunathan Radhakrishnan,et al.
Highlights extraction from sports video based on an audio-visual marker detection framework
,
2005,
2005 IEEE International Conference on Multimedia and Expo.
[5]
Deb Roy,et al.
Temporal feature induction for baseball highlight classification
,
2007,
ACM Multimedia.
[6]
Lie Lu,et al.
Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems
,
2003
.