Speaker Identification Using Whispered Speech

The study of closed set text-independent speaker identification using whisper speech is presented in this paper. A new feature called temporal Teager energy based sub band cepstral coefficients (TTESBCC) is proposed. The work presented compares the performance of four feature sets: Mel frequency cepstral coefficients (MFCC), temporal energy of sub band cepstral coefficients (TESBCC), weighted instantaneous frequency (WIF) and TTESBCC. Next, outputs of three classifiers are combined and its performance is compared with that of the individual classifiers. The speaker identification system is trained using neutral speech and tested using neutral and whisper speech. The database of twenty five speakers containing speech utterances recorded in one of the Indian languages (Marathi) in the neutral and whisper environments is used for experimentation. Gaussian mixture model is used for classification. It is observed that performance of the speaker identification system degrades drastically when tested using whisper speech utterances. Fusion of classifiers enhances the speaker identification accuracy in both whisper and neutral environment.

[1]  Mark A. Clements,et al.  Reconstruction of speech from whispers , 2002, MAVEBA.

[2]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[3]  Raghunath S. Holambe,et al.  Text-Independent Speaker Identification in Emotional Environments: A Classifier Fusion Approach , 2011, ICFCE.

[4]  R. S. Holambe,et al.  Use of fuzzy min-max neural network for speaker identification , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[5]  Hemant A. Patil,et al.  Identifying Phonetically Similar Languages Using Teager Energy Based Cepstrum , 2007, Artificial Intelligence and Pattern Recognition.

[6]  M. Faundez-Zanuy,et al.  State-of-the-art in speaker recognition , 2005, IEEE Aerospace and Electronic Systems Magazine.

[7]  Nirmalya Sen,et al.  Temporal energy and correlation features from Nyquist filter bank for text-independent speaker identification , 2011, IEEE Technology Students' Symposium.

[8]  John H. L. Hansen,et al.  Speaker Identification Within Whispered Speech Audio Streams , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[10]  Aurobinda Routray,et al.  Vocal emotion recognition in five native languages of Assam using new wavelet features , 2009, Int. J. Speech Technol..

[11]  Hideki Kasuya,et al.  Acoustic nature of the whisper , 1999, EUROSPEECH.

[12]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[14]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..