Phonetic class-based speaker verification

Phonetic Class-based Speaker Verification (PCBV) is a natu- ral refinement of the traditional single Gaussian Mixture Model (Single GMM) scheme. The aim is to accurately model the voice characteristics of a user on a per-phonetic class basis. The paper describes briefly the implementation of a representation of the voice characteristics in a hierarchy of phonetic classes. We present a framework to easily study the effect of the mod- eling on the PCBV. A thorough study of the effect of the mod- eling complexity, the amount of enrollment data and noise con- ditions is presented. It is shown that Phoneme-based Verifica- tion (PBV), a special case of PCBV, is the optimal modeling scheme and consistently outperforms the state-of-the-art Single GMM modeling even in noisy environments. PBV achieves to relative error rate reduction while cutting the speaker model size by and CPU by .

[1]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Michael J. Carey,et al.  A speaker verification system using alpha-nets , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[5]  Delphine Charlet,et al.  Confidence measure and incremental adaptation for the rejection of incorrect data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Roland Auckenthaler,et al.  Improving a GMM speaker verification system by phonetic weighting , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Stéphane H. Maes,et al.  Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition , 2003, IEEE Trans. Speech Audio Process..

[9]  Larry P. Heck,et al.  On-line unsupervised adaptation in speaker verification , 2000, INTERSPEECH.

[10]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[11]  Larry Heck Unsupervised On-Line Adaptation in Speaker Verification , 2000 .

[12]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[13]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[14]  Aaron E. Rosenberg,et al.  Speaker background models for connected digit password speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.