论文信息 - Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming

Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming

We consider the problem of segmenting an audio signal into characteristic regions based on feature-set similarities. In the proposed method, a feature-space representation of the signal is generated; then, sequences of feature-space samples are aggregated into clusters corresponding to distinct signal regions. The clustering of feature sets is improved via linear discriminant analysis (LDA); dynamic programming (DP) is used to derive optimal cluster boundaries. The method avoids the heuristics employed in various feature-space segmentation schemes and is able to derive an optimal segmentation once the LDA and DP cost metrics have been chosen. We demonstrate that the method outperforms typical feature-space approaches described in the literature. We focus on an illustrative example of the basic segmentation task; however, by judicious design of the feature set, the training set, and the dynamic program, the method can be tailored for various applications such as speech/music discrimination, segmentation of audio streams for smart transport, or song structure analysis for thumbnailing.

Jean Laroche | Michael M. Goodwin

[1] Mark Sandler,et al. Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[2] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3] Malcolm Slaney,et al. Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Jonathan Foote,et al. Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5] Douglas Keislar,et al. Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[6] George Tzanetakis,et al. Multifeature audio segmentation for browsing and annotation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[7] Keinosuke Fukunaga,et al. Introduction to Statistical Pattern Recognition , 1972 .