The report proposes a method for detecting the sound events in a basketball game with focusing on detecting cheering sound. MFCC (Mel-frequency cepstral coefficient) features are used to identify the cheering sounds from speeches and other confusing sounds. The mfcc features are fed into a neural network and classified into three classes (cheering, speech, and others). To improve the MFCC-NN performance, a measure for temporal spectral variation is proposed, which is defined by LPC coefficient entropy. Normalized energy is also used to eliminate those false alarms caused by background noise. The outputs from these three channels are finally fused together and postprocessing techniques are used in order to get robust results. For other events, such as dribbling, template matching based approach is proposed. Experiments showed our methods achieved good performance for very difficult sound track. The described method can be used in basketball video content retrieval and highlight extraction.
[1]
John F. Canny,et al.
A Computational Approach to Edge Detection
,
1986,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2]
Wenjun Zeng,et al.
Integrated image and speech analysis for content-based video indexing
,
1996,
Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.
[3]
HongJiang Zhang,et al.
Automatic parsing of TV soccer programs
,
1995,
Proceedings of the International Conference on Multimedia Computing and Systems.
[4]
Anoop Gupta,et al.
Automatically extracting highlights for TV Baseball programs
,
2000,
ACM Multimedia.
[5]
Yihong Gong,et al.
Automatic parsing of news video
,
1994,
1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.
[6]
Noboru Babaguchi,et al.
Event Based Video Indexing by Intermodal Collaboration
,
1999
.