论文信息 - Speaker identification in shouted talking environments based on novel Third-Order Hidden Markov Models

Speaker identification in shouted talking environments based on novel Third-Order Hidden Markov Models

In this work we propose, implement, and evaluate novel models called Third-Order Hidden Markov Models (HMM3s) to enhance low performance of text-independent speaker identification in shouted talking environments. The proposed models have been tested on our collected speech database using Mel-Frequency Cepstral Coefficients (MFCCs). Our results demonstrate that HMM3s significantly improve speaker identification performance in such talking environments by 11.3% and 166.7% compared to second-order hidden Markov models (HMM2s) and first-order hidden Markov models (HMM1s), respectively. The achieved results based on the proposed models are close to those obtained in subjective assessment by human listeners.

Ismail Shahin

[1] Sadaoki Furui. Speaker-dependent-feature extraction, recognition and processing techniques , 1991, Speech Commun..

[2] Jean-François Mari,et al. A second-order HMM for high performance word and phoneme-based continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] Ismail Shahin,et al. Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models , 2006, Speech Commun..

[4] Ismail Shahin,et al. Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[5] Yeunung Chen,et al. Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[6] John H. L. Hansen,et al. A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[7] John H. L. Hansen,et al. The Impact of Speech Under `Stress''on Military Speech Technology , 2000 .

[8] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9] Abdelaziz Kriouile,et al. Automatic word recognition based on second-order hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[10] K E Cummings,et al. Analysis of the glottal excitation of emotionally styled and stressed speech. , 1995, The Journal of the Acoustical Society of America.

[11] Ismail Shahin. Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments , 2010, EURASIP J. Audio Speech Music. Process..

[12] A. Routray,et al. Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[13] John H. L. Hansen,et al. Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[14] Richard J. Mammone,et al. Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[15] D. Reynolds. Automatic Speaker Recognition Using Gaussian Mixture Speaker Models , 1995 .

[16] Ismail Shahin. Improving Speaker Identification Performance Under the Shouted Talking Condition Using the Second-Order Hidden Markov Models , 2005, EURASIP J. Adv. Signal Process..

[17] J. Oglesby,et al. Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[18] Jianing Dai,et al. Isolated word recognition using Markov chain models , 1995, IEEE Trans. Speech Audio Process..

[19] John H. L. Hansen,et al. Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[20] S. Dandapat,et al. Speaker recognition under stressed condition , 2010, Int. J. Speech Technol..

[21] Tiago H. Falk,et al. Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.