Fishervoice and semi-supervised speaker clustering

Speaker subspace modeling has become increasingly important in speaker recognition, diarization, and clustering. Principal component analysis (PCA) is a popular linear subspace learning technique and the approach that represents an arbitrary utterance or speaker as a linear combination of a set of basis voices based on PCA is known as the eigenvoice approach. In this paper, a novel technique, namely the fishervoice approach, is proposed. The fishervoice approach is based on linear discriminant analysis, another successful linear subspace learning technique that provides an optimized low-dimensional representation of utterances or speakers with focus on the most discriminative basis voices. We apply the fishervoice approach to speaker clustering in a semi-supervised manner and show that the fishervoice approach significantly outperforms the eigenvoice approach in all our experiments on the GALE Mandarin dataset.

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Roland Kuhn,et al.  Eigenvoices for speaker adaptation , 1998, ICSLP.

[3]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[4]  Herbert Gish,et al.  Clustering speakers by their voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Pietro Laface,et al.  Stream-based speaker segmentation using speaker factors and eigenvoices , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Itshak Lapidot,et al.  Unsupervised speaker recognition based on competition between self-organizing maps , 2002, IEEE Trans. Neural Networks.

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[12]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[17]  Yi Liu,et al.  Recent advances in the IBM GALE Mandarin transcription system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[19]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[20]  Douglas A. Reynolds,et al.  Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[21]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[22]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[23]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[24]  Roland Kuhn,et al.  Speaker identification and verification using eigenvoices , 2000, INTERSPEECH.

[25]  Hsin-Min Wang,et al.  Speaker clustering of speech utterances using a voice characteristic reference space , 2004, INTERSPEECH.

[26]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[27]  G. Ruske,et al.  Robust speaker clustering in eigenspace , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[28]  F. Kubala,et al.  Automatic Speaker Clustering , 1997 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .