Whisper is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect personal privacy and avoid being overheard. Due to differences between whispered and neutral speech in vocal excitation and vocal tract function, the performance of speaker ID systems trained with neutral speech degrades significantly. In this study, a neutral trained closed-set speaker ID task based on MFCC-GMM is considered. It is observed that for whisper speaker recognition, the degradation is concentrated for a certain number of speakers. Next, an acoustic analysis is conducted in order to determine the reason affecting the degradation for those speakers. Finally, a confidence space is proposed to measure the quality of whispered speech for the task of speaker ID. Experimental evaluations demonstrate the effectiveness of this method in searching whispered utterances with poor speaker information for a neutral/whisper mismatch speaker ID system. The proposed method makes it possible to compensate for those poor utterances, meanwhile avoiding any harm to other utterances that remain the performance of neutral speaker ID task.
[1]
Kazuya Takeda,et al.
Analysis and recognition of whispered speech
,
2005,
Speech Commun..
[2]
John H. L. Hansen,et al.
Speaker identification for whispered speech using modified temporal patterns and MFCCs
,
2009,
INTERSPEECH.
[3]
John H. L. Hansen,et al.
Discrete-Time Processing of Speech Signals
,
1993
.
[4]
John H. L. Hansen,et al.
Speaker identification with whispered speech based on modified LFCC parameters and feature mapping
,
2009,
2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[5]
Hideki Kasuya,et al.
Acoustic nature of the whisper
,
1999,
EUROSPEECH.
[6]
John H. L. Hansen,et al.
Speaker identification for whispered speech based on frequency warping and score competition
,
2008,
INTERSPEECH.
[7]
Tanja Schultz,et al.
Whispering Speaker Identification
,
2007,
2007 IEEE International Conference on Multimedia and Expo.
[8]
Mark A. Clements,et al.
Reconstruction of speech from whispers
,
2002,
MAVEBA.