论文信息 - A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION

A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION

In this paper we present a generic system for audio indexing (classification/segmentation)andapplyittotwousualproblems: speech/ music segmentation and music genre recognition. We first present some requirements for the design of a generic system. The training part of it is based on a succession of four steps: feature extraction, feature selection, feature space transform and statistical modeling. We then propose several approaches for the indexing part depending of the local/ global characteristics of the indexes to be found. In particular we propose the use of segment-statistical models. The system is then applied to two usual problems. The first one is the speech/ music segmentation of a radio stream. The application is developed in a real industrial framework using real world categories and data. The performances obtained for the pure speech/ music classes problem are good. However when considering also the non-pure categories (mixed, bed) the performances of the system drop. The second problem is the music genre recognition. Since the indexes to be found are global, “segment-statistical models” are used leading to results close to the state of the art.

Geoffroy Peeters

[1] J. Stephen Downie,et al. The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[2] Daniel P. W. Ellis,et al. Automatic Record Reviews , 2004, ISMIR.

[3] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Gaël Richard,et al. Combined Supervised and Unsupervised Approaches for Automatic Segmentation of Radiophonic Audio Streams , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5] Jonas Beskow,et al. Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[6] Jeroen Breebaart,et al. Features for audio and music classification , 2003, ISMIR.

[7] Julien Pinquier,et al. Audio indexing: primary components retrieval , 2006, Multimedia Tools and Applications.

[8] Ichiro Fujinaga,et al. ACE: A Framework for Optimizing Music Classification , 2005, ISMIR.

[9] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[10] Michael J. Carey,et al. A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11] Alexander Lerch,et al. A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[12] Liming Chen,et al. Robust speech music discrimination using spectrum's first order statistics and neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[13] J. Jośe. A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[14] Lie Lu,et al. Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[15] G. Peeters. Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization , 2003 .

[16] Xavier Rodet,et al. Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[17] François Pachet,et al. Representing Musical Genre: A State of the Art , 2003 .

[18] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[19] George Tzanetakis,et al. MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[20] Ichiro Fujinaga,et al. jAudio: An Feature Extraction Library , 2005, ISMIR.

[21] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22] Stephen R. Garner,et al. WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[23] François Pachet,et al. Automatic extraction of music descriptors from acoustic signals , 2004, ISMIR.

[24] Ichiro Fujinaga,et al. Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.