Improved phonetic and lexical speaker recognition through MAP adaptation

High level features such as phone and word n-grams have been shown to be effective for speaker recognition, particularly when used along side traditional acoustic speaker recognition techniques. The applicability of these high-level recognition systems is impeded by the large training data requirements needed to build robust and stable speaker models. This paper describes an extension to an existing phone n-gram based speaker recognition technique, whereby MAP adaptation is used in the speaker model training process. Results obtained for the NIST 2003 Speaker Recognition Extended Data Task indicate that a significant improvement in performance can be gained through the use of this model estimation technique. In our tests, we were able to improve performance over the baseline system, and at the same time, halved the training data requirement. Further experimentation using MAP adaptation on word n-gram models also showed improvement over baseline results, suggesting that the technique could be applied to other multinomial distribution feature sets.

[1]  Sridha Sridharan,et al.  Dependence of GMM adaptation on feature post-processing for speaker recognition , 2003, INTERSPEECH.

[2]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[3]  Joseph P. Campbell,et al.  Phonetic, idiolectal and acoustic speaker recognition , 2001, Odyssey.

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  Marc A. Zissman,et al.  Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[7]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Joseph P. Campbell,et al.  Gender-dependent phonetic refraction for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Chin-Hui Lee,et al.  Bayesian Adaptive Learning and Map Estimation of HMM , 1996 .