Speaker Verification Based on Different Vector Quantization Techniques with Gaussian Mixture Models

The introduction of Gaussian Mixture Models (GMMs) in the field of speaker verification has led to very good results. This paper illustrates an evolution in state-of-the-art Speaker Verification by highlighting the contribution of recently established information theoretic based vector quantization technique. We explore the novel application of three different vector quantization algorithms, namely K-means, Linde-Buzo-Gray (LBG) and Information Theoretic Vector Quantization (ITVQ) for efficient speaker verification. The Expectation Maximization (EM) algorithm used by GMM requires a prohibitive amount of iterations to converge. In this paper, comparable alternatives to EM including K-means, LBG and ITVQ algorithm were tested. The GMM-ITVQ algorithm was found to be the most efficient alternative for the GMM-EM. It gives correct classification rates at a similar level to that of GMM-EM. Finally, representative performance benchmarks and system behaviour experiments on NIST SRE corpora are presented.

[1]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[2]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Thambipillai Srikanthan,et al.  Vector quantization techniques for GMM based speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Samy Bengio,et al.  A unified framework for score normalization techniques applied to text-independent speaker verification , 2005, IEEE Signal Processing Letters.

[7]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[8]  Sridha Sridharan,et al.  Vector quantization based Gaussian modeling for speaker verification , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[9]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[10]  Ethem Alpaydin,et al.  Soft vector quantization and the EM algorithm , 1998, Neural Networks.

[11]  Constantine Kotropoulos,et al.  Gaussian Mixture Modeling by Exploiting the Mahalanobis Distance , 2008, IEEE Transactions on Signal Processing.

[12]  Matti Karjalainen Speech communication, human and machine: by Douglas O'Shaughnessy, INRS-Telecommunication. Publisher: Addison-Wesley Publishing Company, Route 128, Reading, MA 01867, U.S.A., 1987, xviii+568 pp., ISBN 0-201-16520-1 , 1988 .

[13]  Günther Palm,et al.  A new codebook training algorithm for VQ-based speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Jan Skoglund,et al.  Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[15]  Deniz Erdogmus,et al.  Vector quantization using information theoretic concepts , 2005, Natural Computing.

[16]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[17]  Margaret Lech,et al.  Speaker Verification Based on Information Theoretic Vector Quantization , 2008, IMTIC.

[18]  Simon King,et al.  Speech and Audio Signal Processing , 2011 .

[19]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[20]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[21]  J. Lynch,et al.  Speech/Silence segmentation for real-time coding via rule based adaptive endpoint detection , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .