Fusing acoustic, phonetic and data-driven systems for text-independent speaker verification

This paper describes our recent efforts in exploring datadriven high-level features and their combination with low-level spectral features for speaker verification. In particular, we compare the phonetic and data-driven approaches and study their complementarity with short-term acoustic approach. Our objective is to show that data-driven units automatically acquired from the speech data, can be used like phonemes to extract highlevel features and to bring complementary speaker-specific information that can therefore provide improvements when fused with acoustic systems. Results obtained on the NIST 2006 Speaker Recognition Evaluation data show that the combination of the phonetic, data-driven and Gaussian Mixture Models (GMM) systems brings a 27% relative reduction of the EER in comparison to the baseline GMM system.

[1]  Douglas A. Reynolds,et al.  Fusing high- and low-level features for speaker recognition , 2003, INTERSPEECH.

[2]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Andreas Stolcke,et al.  Modeling duration patterns for speaker recognition , 2003, INTERSPEECH.

[4]  Asmaa El Hannani,et al.  Exploiting High-Level Information Provided by ALISP in Speaker Recognition , 2005, NOLISP.

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Douglas A. Reynolds,et al.  The SuperSID project: exploiting high-level information for high-accuracy speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[7]  Asmaa El Hannani,et al.  Improving Speaker Verification Using ALISP-Based Specific GMMs , 2005, AVBPA.

[8]  Julian Fiérrez,et al.  Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[9]  Doroteo Torre Toledano,et al.  Using Data-driven and Phonetic Units for Speaker Verification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[10]  Gérard Chollet,et al.  Toward ALISP: A proposal for Automatic Language Independent Speech Processing , 1999 .