Text-independent speaker recognition by speaker-specific GMM and speaker adapted syllable-based HMM

We present a new text-independent speaker recognition method by combining speaker-specific Gaussian Mixture Model(GMM) with syllable-based HMM adapted by MLLR or MAP. The robustness of this speaker recognition method for speaking style’s change was evaluated. The speaker identification experiment using NTT database which consists of sentences data uttered at three speed modes (normal, fast and slow) by 35 Japanese speakers(22 males and 13 females) on five sessions over ten months was conducted. Each speaker uttered only 5 training utterances. We obtained the accuracy of 100% for text-independent speaker identification. This result was superior to some conventional methods for the same database.

[1]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Seiichi Nakagawa,et al.  An unsupervised speaker adaptation method for continuous parameter HMM by maximum a posteriori probability estimation , 1994, ICSLP.

[3]  Frank K. Soong,et al.  Continuous probabilistic acoustic map for speaker identification , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Seiichi Nakagawa,et al.  TEXT-INDEPENDENT SPEAKER IDENTIFICATION ON TIMIT DATABASE , 1995 .

[5]  Douglas E. Sturim,et al.  Speaker Verification using Text-Constrained Gaussian , 2002 .

[6]  M. Savic,et al.  Variable parameter speaker verification system based on hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Seiichi Nakagawa,et al.  Text-independent speaker recognition using non-linear frame likelihood transformation , 1998, Speech Commun..

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Masafumi Nishida,et al.  Speaker recognition by separating phonetic space and speaker space , 2001, INTERSPEECH.

[11]  Frank K. Soong,et al.  An orthogonal polynomial representation of speech signals and its probabilistic model for text independent speaker verification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Chiyomi Miyajima,et al.  Speaker identification using Gaussian mixture models based on multi-space probability distribution , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Alex Park,et al.  ASR dependent techniques for speaker identification , 2002, INTERSPEECH.