论文信息 - A Syllable Lattice Approach to Speaker Verification

A Syllable Lattice Approach to Speaker Verification

This paper proposes a syllable-lattice-based speaker verification algorithm for Mandarin Chinese input. For each speech utterance, a syllable lattice is generated with a speaker-independent large-vocabulary continuous speech recognition system in free syllable decoding. The verification decision is made based upon the likelihood ratio between a target-speaker model and a speaker-independent background model, computed on the decoded syllable lattice. The likelihood function is calculated efficiently in a forward algorithm by considering all paths in the lattice. The proposed algorithm was evaluated using a Mandarin Chinese database, where 1832 true and 26 250 impostor trials were recorded by 19 target speakers and 180 impostors. The average duration of each trial is 2 s long without silence. The target-speaker model was adapted from the speaker-independent background model using enrollment data of two minutes with silence. The proposed algorithm achieved an equal-error rate of 0.857% which is better than 1.21% of the hidden Markov model-based speaker verification algorithm without using syllable lattices. The equal-error rate was further reduced to 0.617% by incorporating the Goussian mixture model-universal background model algorithm with 2048 Gaussian kernels whose equal error rate is 0.990%.

[1] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2] Yu Shi,et al. Segmental tonal modeling for phone set design in Mandarin LVCSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Sadaoki Furui,et al. Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Jirí Navrátil,et al. The IBM system for the NIST-2002 cellular speaker verification evaluation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5] Ramesh A. Gopinath,et al. Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6] Lin-Shan Lee,et al. Voice dictation of Mandarin Chinese , 1997, IEEE Signal Process. Mag..

[7] Peder A. Olsen,et al. Modeling inverse covariance matrices by basis expansion , 2004, IEEE Trans. Speech Audio Process..

[8] Douglas A. Reynolds,et al. Integration of speaker recognition into conversational spoken dialogue systems , 2003, INTERSPEECH.

[9] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[10] George R. Doddington,et al. Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[11] Larry Gillick,et al. Speaker Recognition on Single- and Multispeaker Data , 2000, Digit. Signal Process..

[12] Alvin F. Martin,et al. The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[13] Ye Tian,et al. Nonspeech segment rejection based on prosodic information for robust speech recognition , 2002, IEEE Signal Processing Letters.

[14] David A. van Leeuwen,et al. NIST and NFI-TNO evaluations of automatic speaker recognition , 2006, Comput. Speech Lang..

[15] Joseph P. Campbell,et al. Phonetic speaker recognition , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[16] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[17] Frank K. Soong,et al. Syllable Lattice Based Re-Scoring For Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18] Geoffrey Zweig,et al. LATTICE-BASED UNSUPERVISED MLLR FOR SPEAKER ADAPTATION , 2000 .

[19] Jean-Luc Gauvain,et al. Language recognition using phone latices , 2004, INTERSPEECH.

[20] Biing-Hwang Juang,et al. A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[21] Roland Kuhn,et al. Eigenvoices for speaker adaptation , 1998, ICSLP.

[22] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[23] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24] Andreas Stolcke,et al. Improved phonetic speaker recognition using lattice decoding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25] Samy Bengio,et al. A comparative study of adaptation methods for speaker verification , 2002, INTERSPEECH.

[26] Peder A. Olsen,et al. Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[27] Aaron E. Rosenberg,et al. Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28] Roger Fletcher,et al. A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[29] Philip C. Woodland,et al. Speaker adaptation using lattice-based MLLR , 2001 .

[30] Hsin-Min Wang,et al. Experiments in syllable-based retrieval of broadcast news speech in Mandarin Chinese , 2000, Speech Commun..

[31] Steve Young,et al. The HTK book , 1995 .