Audio segmentation by feature-space clustering using linear discriminant analysis and dynamic programming

We consider the problem of segmenting an audio signal into characteristic regions based on feature-set similarities. In the proposed method, a feature-space representation of the signal is generated; then, sequences of feature-space samples are aggregated into clusters corresponding to distinct signal regions. The clustering of feature sets is improved via linear discriminant analysis (LDA); dynamic programming (DP) is used to derive optimal cluster boundaries. The method avoids the heuristics employed in various feature-space segmentation schemes and is able to derive an optimal segmentation once the LDA and DP cost metrics have been chosen. We demonstrate that the method outperforms typical feature-space approaches described in the literature. We focus on an illustrative example of the basic segmentation task; however, by judicious design of the feature set, the training set, and the dynamic program, the method can be tailored for various applications such as speech/music discrimination, segmentation of audio streams for smart transport, or song structure analysis for thumbnailing.

[1]  Mark Sandler,et al.  Segmentation of Musical Signals Using Hidden Markov Models. , 2001 .

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[5]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[6]  George Tzanetakis,et al.  Multifeature audio segmentation for browsing and annotation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[7]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .