Speaker verification based on combining speaker individuality parameter selection and decision

We propose a new framework to incorporate speaker individuality parameters, such as pitch, vocal tract length and speaking rate, into designing speaker recognition systems. Based on our preliminary observations, a single pitch parameter may be more powerful than a vector of cepstral features for discriminating some speakers. Previous efforts have focused on concatenating these speaker parameters to existing MFCC based feature vectors. In this study a procedure is proposed to compare the effectiveness of the available set of parameters. The chosen parameter is then used to perform speaker verification. We test the proposed framework on the TIMIT database. Based on an intuitive parameter selection procedure to choose between a single pitch and the conventional 39-dim MFCC vector in a separate validation set, we found that parameter selection errors was reduced from 70, when only the MFCC parameter vector was used, to 25, when both parameter sets were made available in the selection process. For those 79 speakers whose corresponding pitch-based system was preferred for speaker verification, the average equal error rate was reduced from 23.1% to 18.4%. This strategy can be extended to incorporating other speaker individuality parameters

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Panayiotis G. Georgiou,et al.  Speaker identification using supra-segmental pitch pattern dynamics , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[4]  Jinyu Li,et al.  A study on separation between acoustic models and its applications , 2005, INTERSPEECH.

[5]  C.-H. Lee,et al.  From knowledge-ignorant to knowledge-rich modeling : a new speech research parading for next generation automatic speech recognition , 2004 .

[6]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[7]  Eric Chang,et al.  Comparison of discriminative training methods for speaker verification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Douglas A. Reynolds,et al.  Fusing high- and low-level features for speaker recognition , 2003, INTERSPEECH.

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  D. Reynolds Automatic Speaker Recognition Using Gaussian Mixture Speaker Models , 1995 .

[11]  Michael J. Carey,et al.  Robust prosodic features for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Larry P. Heck,et al.  A lognormal tied mixture model of pitch for prosody based speaker recognition , 1997, EUROSPEECH.

[13]  Douglas A. Reynolds,et al.  Modeling prosodic dynamics for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Sadaoki Furui,et al.  Text-independent speaker recognition using vocal tract and pitch information , 1990, ICSLP.

[15]  H. Wakita Normalization of vowels by vocal-tract length and its application to vowel identification , 1977 .

[16]  Partha Niyogi,et al.  A detection framework for locating phonetic events , 1998, ICSLP.

[17]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..