Real-time speaker identification and verification

In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7% can be reached in 0.84 s on average when the length of test utterance is 30.4 s.

[1]  Alvin F. Martin,et al.  Speaker recognition in a multi-speaker environment , 2001, INTERSPEECH.

[2]  Shrikanth S. Narayanan,et al.  Speaker change detection using a new weighted distance measure , 2002, INTERSPEECH.

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  Stéphane H. Maes,et al.  A hierarchical approach to large-scale speaker recognition , 1999, EUROSPEECH.

[5]  Arnon D. Cohen,et al.  On cohort selection for speaker verification , 2003, INTERSPEECH.

[6]  Thambipillai Srikanthan,et al.  Vector quantization techniques for GMM based speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[8]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Yunxin Zhao,et al.  Fast model selection based speaker adaptation for nonnative speech , 2003, IEEE Trans. Speech Audio Process..

[10]  Ren-Hua Wang,et al.  A weighted distance measure based on the fine structure of feature space: application to speaker recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Sadaoki Furui,et al.  A text-independent speaker recognition method robust against utterance variations , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Itshak Lapidot,et al.  Unsupervised speaker recognition based on competition between self-organizing maps , 2002, IEEE Trans. Neural Networks.

[13]  Robert I. Damper,et al.  Impostor cohort selection for score normalisation in speaker verification , 1997, Pattern Recognit. Lett..

[14]  Frédéric Bimbot,et al.  Steps toward the integration of speaker recognition in real-world telecom applications , 1998, ICSLP.

[15]  Dominik R. Dersch,et al.  Speaker models designed from complete data sets: a new approach to text-independent speaker verification , 1997, EUROSPEECH.

[16]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[17]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[18]  Toby Berger,et al.  Efficient text-independent speaker verification with structural Gaussian mixture models and neural network , 2003, IEEE Trans. Speech Audio Process..

[19]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[21]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[22]  Tomi Kinnunen,et al.  Speaker Discriminative Weighting Method for VQ-Based Speaker Identification , 2001, AVBPA.

[23]  Roland Auckenthaler,et al.  Gaussian selection applied to text-independent speaker verification , 2001, Odyssey.

[24]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[25]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[26]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[27]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[28]  Günther Palm,et al.  A discriminative training algorithm for VQ-based speaker identification , 1999, IEEE Trans. Speech Audio Process..

[29]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[30]  Wai-Yip Chan,et al.  An experimental assessment of personal speech coding , 2000, Speech Commun..

[31]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[32]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[33]  Alvin F. Martin,et al.  The NIST 1999 Speaker Recognition Evaluation - An Overview , 2000, Digit. Signal Process..

[34]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .

[35]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[36]  Ming Liu,et al.  Hierarchical Gaussian mixture model for speaker verification , 2002, INTERSPEECH.

[37]  Daben Liu,et al.  Fast speaker change detection for broadcast news transcription and indexing , 1999, EUROSPEECH.

[38]  A. Nejat Ince,et al.  Digital Speech Processing , 1992 .

[39]  Justinian P. Rosca,et al.  Enhanced VQ-Based Algorithms for Speech Independent Speaker Identification , 2003, AVBPA.

[40]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[41]  J.H.L. Hansen,et al.  An efficient scoring algorithm for Gaussian mixture model based speaker identification , 1998, IEEE Signal Processing Letters.

[42]  Tomi Kinnunen,et al.  Class-Discriminative Weighted Distortion Measure for VQ-based Speaker Identification , 2002, SSPR/SPR.

[43]  Bing Sun,et al.  Hierarchical speaker identification using speaker clustering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[44]  Lawrence G. Bahler,et al.  Voice identification using nearest-neighbor distance measure , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[46]  Tomi Kinnunen,et al.  A Speaker Pruning Algorithm for Real-Time Speaker Identification , 2003, AVBPA.

[47]  Aladdin M. Ariyaeeinia,et al.  Analysis and comparison of score normalisation methods for text-dependent speaker verification , 1997, EUROSPEECH.

[48]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[49]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[50]  John S. D. Mason,et al.  Speaker recognition and the acoustic speech space , 2001, Odyssey.

[51]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..