A GENERIC SYSTEM FOR AUDIO INDEXING: APPLICATION TO SPEECH/ MUSIC SEGMENTATION AND MUSIC GENRE RECOGNITION

In this paper we present a generic system for audio indexing (classification/segmentation)andapplyittotwousualproblems: speech/ music segmentation and music genre recognition. We first present some requirements for the design of a generic system. The training part of it is based on a succession of four steps: feature extraction, feature selection, feature space transform and statistical modeling. We then propose several approaches for the indexing part depending of the local/ global characteristics of the indexes to be found. In particular we propose the use of segment-statistical models. The system is then applied to two usual problems. The first one is the speech/ music segmentation of a radio stream. The application is developed in a real industrial framework using real world categories and data. The performances obtained for the pure speech/ music classes problem are good. However when considering also the non-pure categories (mixed, bed) the performances of the system drop. The second problem is the music genre recognition. Since the indexes to be found are global, “segment-statistical models” are used leading to results close to the state of the art.

[1]  J. Stephen Downie,et al.  The Music Information Retrieval Evaluation eXchange (MIREX) , 2006 .

[2]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Gaël Richard,et al.  Combined Supervised and Unsupervised Approaches for Automatic Segmentation of Radiophonic Audio Streams , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[6]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[7]  Julien Pinquier,et al.  Audio indexing: primary components retrieval , 2006, Multimedia Tools and Applications.

[8]  Ichiro Fujinaga,et al.  ACE: A Framework for Optimizing Music Classification , 2005, ISMIR.

[9]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[10]  Michael J. Carey,et al.  A comparison of features for speech, music discrimination , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  Alexander Lerch,et al.  A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[12]  Liming Chen,et al.  Robust speech music discrimination using spectrum's first order statistics and neural networks , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[13]  J. Jośe A HIERARCHICAL APPROACH TO AUTOMATIC MUSICAL GENRE CLASSIFICATION , 2003 .

[14]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[15]  G. Peeters Automatic Classification of Large Musical Instrument Databases Using Hierarchical Classifiers with Inertia Ratio Maximization , 2003 .

[16]  Xavier Rodet,et al.  Toward Automatic Music Audio Summary Generation from Signal Analysis , 2002, ISMIR.

[17]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[18]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[19]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[20]  Ichiro Fujinaga,et al.  jAudio: An Feature Extraction Library , 2005, ISMIR.

[21]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[23]  François Pachet,et al.  Automatic extraction of music descriptors from acoustic signals , 2004, ISMIR.

[24]  Ichiro Fujinaga,et al.  Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.