A background music detection method based on robust feature extraction

We propose a music segment detection method for audio signals. Unlike many existing methods, ours specifically focuses on a background-music detection task, that is, detecting music used in background of main sounds. This task is important because music is almost always overlapped by speech or other environmental sounds in visual materials such as TV programs. Our method consists of feature extraction, dimension reduction, and statistical discrimination steps. For each step, we analyzed a set of methods to maximize the detection accuracy. With a simple post processing step, we achieved a framewise error rate as low as 8 % even when the mixed speech was louder than the target music by 10dB.

[1]  Lie Lu,et al.  Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2]  Hiroshi Hamada,et al.  A sound-based approach to video indexing and its application , 1998, Systems and Computers in Japan.

[3]  Hervé Bourlard,et al.  Speech/music segmentation using entropy and dynamism features in a HMM classification framework , 2003, Speech Commun..

[4]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[5]  Chunru Wan,et al.  Feature selection for automatic classification of musical instrument sounds , 2001, JCDL '01.

[6]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Kikuo Maekawa Compilation of the Balanced Corpus of Contemporary Written Japanese in the KOTONOHA Initiative (Invited Paper) , 2008, 2008 Second International Symposium on Universal Communication.

[10]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Sergios Theodoridis,et al.  A Speech/Music Discriminator for Radio Recordings Using Bayesian Networks , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[13]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.