Speaker identification with whispered speech based on modified LFCC parameters and feature mapping

Much research recently in speaker recognition has been devoted to robustness due to microphone and channel effects. However, changes in vocal effort, especially whispered speech, present significant challenges in maintaining system performance. Due to the absence of any periodic excitation in whisper, the spectral structure in whisper and neutral speech will differ. Therefore, performance of speaker ID systems, trained mainly with high energy voiced phonemes, degrades when tested with whisper. This study considers a front-end feature compensation method for whispered speech to improve speaker recognition using a neutral trained system. First, an alternative feature vector with linear frequency cepstral coefficients (LFCC) is introduced based on spectral analysis from both speech modes. Next, for the first time a feature mapping is proposed for reducing whisper/neutral mismatch in speaker ID. Feature mapping is applied on a frame-by-frame basis between two speaker independent GMMs (Gaussian Mixture Models) of whispered and neutral speech. Text independent closed set speaker ID results show an absolute 20% improvement in accuracy when compared with a traditional MFCC feature based system. This result confirms a viable approach to improving speaker ID performance between neutral and whispered speech conditions.

[1]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[2]  Sridha Sridharan,et al.  Data-driven clustering for blind feature mapping in speaker verification , 2005, INTERSPEECH.

[3]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[5]  Hideki Kasuya,et al.  Acoustic nature of the whisper , 1999, EUROSPEECH.

[6]  Richard M. Stern,et al.  Data-driven environmental compensation for speech recognition: A unified approach , 1998, Speech Commun..

[7]  Tanja Schultz,et al.  Whispering Speaker Identification , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[8]  John H. L. Hansen,et al.  Speaker identification for whispered speech based on frequency warping and score competition , 2008, INTERSPEECH.