论文信息 - Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems

Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems

Abstract. Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non- pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different- length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4- hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

[1] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[3] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4] Wolfgang Effelsberg,et al. Automatic audio content analysis , 1997, MULTIMEDIA '96.

[5] Jonathan Foote,et al. Content-based retrieval of music and audio , 1997, Other Conferences.

[6] Tsuhan Chen,et al. Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[7] Don Kimber,et al. Acoustic Segmentation for Audio Browsers , 1997 .

[8] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[9] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[10] Dragutin Petkovic,et al. Towards robust features for classifying audio in the CueVideo system , 1999, MULTIMEDIA '99.

[11] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .

[12] Jonathan Foote,et al. An overview of audio information retrieval , 1999, Multimedia Systems.

[13] C.-C. Jay Kuo,et al. Heuristic approach for generic audio data segmentation and annotation , 1999, MULTIMEDIA '99.

[14] Peter Kabal,et al. Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15] Stan Z. Li,et al. Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[16] Pedro J. Moreno,et al. Using the Fisher kernel method for Web audio classification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17] Lie Lu,et al. A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[18] Lie Lu,et al. Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[19] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.