Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM

We presented a new text-independent speaker recognition method by combining a speaker-specific Gaussian mixture model (GMM) with a syllable-based HMM adapted by MLLR or MAP (S. Nakagawa et al., Proc. Eurospeech, p.3017-3020, 2003). The robustness of this speaker recognition method for speaking style changes was evaluated in this paper. A speaker identification experiment, using an NTT database, which consists of sentences of data uttered at three speed modes (normal, fast and slow) by 35 Japanese speakers (22 males and 13 females) on five sessions over ten months, was conducted. Each speaker uttered only 5 training utterances (about 20 seconds in total). We obtained an accuracy of 98.8% for text-independent speaker identification for three speaking style modes (normal, fast, slow) by using a short test utterance (about 4 seconds). This result was superior to conventional methods for the same database. We show that the attractive result was brought from the compensational effect between speaker specific GMM and speaker adapted syllable based HMM.

[1]  Wei Zhang,et al.  Text-independent speaker recognition by speaker-specific GMM and speaker adapted syllable-based HMM , 2003, INTERSPEECH.

[2]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Seiichi Nakagawa,et al.  Text-independent speaker recognition using non-linear frame likelihood transformation , 1998, Speech Commun..

[4]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Douglas A. Reynolds,et al.  Integration of speaker recognition into conversational spoken dialogue systems , 2003, INTERSPEECH.

[6]  Seiichi Nakagawa,et al.  TEXT-INDEPENDENT SPEAKER IDENTIFICATION ON TIMIT DATABASE , 1995 .

[7]  Douglas E. Sturim,et al.  Speaker verification using text-constrained Gaussian Mixture Models , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Masafumi Nishida,et al.  Speaker recognition by separating phonetic space and speaker space , 2001, INTERSPEECH.

[9]  Frank K. Soong,et al.  Continuous probabilistic acoustic map for speaker identification , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Douglas E. Sturim,et al.  Speaker Verification using Text-Constrained Gaussian , 2002 .

[11]  M. Savic,et al.  Variable parameter speaker verification system based on hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Chiyomi Miyajima,et al.  Speaker identification using Gaussian mixture models based on multi-space probability distribution , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Alex Park,et al.  ASR dependent techniques for speaker identification , 2002, INTERSPEECH.

[14]  Frank K. Soong,et al.  An orthogonal polynomial representation of speech signals and its probabilistic model for text independent speaker verification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[16]  Steve Young,et al.  The HTK book , 1995 .