Automatic Audio Genre Classification Based on Support Vector Machine

Audio classification is very important in audio indexing, analysis and content-based video retrieval. In this paper, we have proposed a clip-based support vector machine (SVM) approach to classify audio signals into six classes, which are pure speech, music, silence, environmental sound, speech with music and speech with environmental sound. The classification results are then used to partition a video into homogeneous audio segments, which is used to analyze and retrieve its high-level content. The experimental results show that the proposed system not only improves classification accuracy, but also performs better than the other classification systems using the decision tree (DT), K nearest neighbor (K-NN) and neural network (NN).

[1]  Trieu-Kien Truong,et al.  Audio classification and categorization based on wavelets and support vector Machine , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[3]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[4]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[5]  Moncef Gabbouj,et al.  A generic audio classification and segmentation approach for multimedia indexing and retrieval , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[7]  C.-C. Jay Kuo,et al.  Hierarchical classification of audio data for archiving and retrieving , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Erling H. Wold,et al.  Content-Based Search, and Retrieval of Audio , 1996 .

[9]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Dragutin Petkovic,et al.  Towards robust features for classifying audio in the CueVideo system , 1999, MULTIMEDIA '99.