Fusion of descriptors for speech / music classification

This work addresses the soundtrack indexing of multimedia documents. We present a speech/music classification system based on three original features: entropy modulation, stationary segment duration and number of segments. They were merged by basic score maximisation with the classical 4 Hertz modulation energy. We validate this fusion approach with the use of the probability theory and the evidence theory. The system is tested on radio corpora. Systems are simple, robust and could be improved on every corpus without training or adaptation.

[1]  Xavier Rodet,et al.  Automatic Characterisation of Musical Signals: Feature Extraction and Temporal Segmentation , 1999 .

[2]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[4]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  I. Demeure,et al.  Systèmes de processus légers: concepts et exemples , 1994 .

[6]  H. Prade,et al.  La fusion d'informations imprécises , 1994 .

[7]  C.-C. Jay Kuo,et al.  Hierarchical system for content-based audio classification and retrieval , 1998, Other Conferences.

[8]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9]  Julien Pinquier,et al.  A fusion study in speech / music classification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  Philippe Leray,et al.  Pertinence des mesures de confiance en classification , 2000 .

[11]  Mübeccel Demirekler,et al.  Speaker identification by combining multiple classifiers using Dempster-Shafer theory of evidence , 2003, Speech Commun..

[12]  Régine André-Obrecht,et al.  Direct identification vs. correlated models to process acoustic and articulatory informations in automatic speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Fabrice Janez Fusion de sources d'information définies sur des référentiels non exhaustifs différents : solutions proposées sous le formalisme de la théorie de l'évidence , 1996 .

[14]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.