A regularized kernel-based approach to unsupervised audio segmentation

We introduce a regularized kernel-based rule for unsupervised change detection based on a simpler version of the recently proposed kernel Fisher discriminant ratio. Compared to other kernel-based change detectors found in the literature, the proposed test statistic is easier to compute and has a known asymptotic distribution which can effectively be used to set the false alarm rate a priori. This technique is applied for segmenting tracks from TV shows, both for segmentation into semantically homogeneous sections (applause, movie, music, etc.) and for speaker diarization within the speech sections. On these tasks, the proposed approach outperforms other kernel-based tests and is competitive with a standard HMM-based supervised alternative.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  Gilles Blanchard,et al.  Finite-Dimensional Projection for Classification and Statistical Learning , 2008 .

[3]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[4]  Manuel Davy,et al.  An online kernel change detection algorithm , 2005, IEEE Transactions on Signal Processing.

[5]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[6]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[7]  O. Cappé,et al.  Retrospective Mutiple Change-Point Estimation with Kernels , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[8]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[10]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[12]  Christian Wellekens,et al.  DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[13]  Jhing-Fa Wang,et al.  Unsupervised speaker change detection using SVM training misclassification rate , 2007, IEEE Transactions on Computers.

[14]  Belkacem Fergani,et al.  Unsupervised speaker indexing using one-class Support Vector Machines , 2006, 2006 14th European Signal Processing Conference.