Far-Field Speaker Recognition

In this paper, we study robust speaker recognition in far-field microphone situations. Two approaches are investigated to improve the robustness of speaker recognition in such scenarios. The first approach applies traditional techniques based on acoustic features. We introduce reverberation compensation as well as feature warping and gain significant improvements, even under mismatched training-testing conditions. In addition, we performed multiple channel combination experiments to make use of information from multiple distant microphones. Overall, we achieved up to 87.1% relative improvements on our Distant Microphone database and found that the gains hold across different data conditions and microphone settings. The second approach makes use of higher-level linguistic features. To capture speaker idiosyncrasies, we apply n-gram models trained on multilingual phone strings and show that higher-level features are more robust under mismatching conditions. Furthermore, we compared the performances between multilingual and multiengine systems, and examined the impact of a number of involved languages on recognition results. Our findings confirm the usefulness of language variety and indicate a language independent nature of this approach, which suggests that speaker recognition using multilingual phone strings could be successfully applied to any given language.

[1]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[2]  Joseph P. Campbell,et al.  Phonetic speaker recognition , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[3]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[4]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[5]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[6]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[7]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[8]  Tanja Schultz,et al.  Far-Field Speaker Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Alex Waibel,et al.  Robust speaker recognition , 2007 .

[10]  Ramesh A. Gopinath,et al.  Short-time Gaussianization for robust speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Joseph P. Campbell,et al.  Gender-dependent phonetic refraction for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[13]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[14]  Tanja Schultz,et al.  Improvements in Non-Verbal Cue Identification Using Multilingual Phone Strings , 2002, Speech-to-Speech Translation@ACL.

[15]  Douglas A. Reynolds,et al.  Combining cross-stream and time dimensions in phonetic speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  Tanja Schultz,et al.  Speaker identification using multilingual phone strings , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  Elizabeth Shriberg,et al.  Using prosodic and lexical information for speaker identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.