论文信息 - Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering

Learning essential speaker sub-space using hetero-associative neural networks for speaker clustering

In this paper, we present a novel approach to speaker clustering involving the use of hetero-associative neural network (HANN) to compute very low dimensional speaker discriminatory features (in our case 1-dimensional) in a data-driven manner. A HANN trained to map input feature space onto speaker labels through a bottle-neck hidden layer is expected to learn very low dimensional feature subspace essentially containing speaker information. The lower dimensional features are further used in a simple k-means clustering algorithm to obtain speaker segmentation. Evaluation of this approach on a database of real-life conversational speech from call-centers show that clustering performance achieved is similar to that of the state-ofthe-art systems, although our approach uses just 1-dimensional features. Augmenting these features with the traditional melfrequency cepstral coefficients (MFCC) features in the state-ofthe-art system resulted in improved clustering performance.

Karthik Visweswariah | Shajith Ikbal

[1] M. Kramer. Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[2] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3] Vaibhava Goel,et al. Rapid adaptation with linear combinations of rank-one matrices , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Karthik Visweswariah,et al. Speech activity detection fusing acoustic phonetic and energy features , 2005, INTERSPEECH.

[5] Douglas A. Reynolds,et al. Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[7] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8] Li Lee,et al. Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9] Jing Huang,et al. Detection, diarization, and transcription of far-field lecture speech , 2007, INTERSPEECH.

[10] Xavier Anguera Miró,et al. Purity Algorithms for Speaker Diarization of Meetings Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11] Jean-Luc Gauvain,et al. Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[12] Jing Huang,et al. The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings , 2007, CLEAR.

[13] Corinne Fredouille,et al. Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records , 2006, MLMI.

[14] Bayya Yegnanarayana,et al. Analysis of autoassociative mapping neural networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).