论文信息 - Novel third-order hidden Markov models for speaker identification in shouted talking environments

Novel third-order hidden Markov models for speaker identification in shouted talking environments

Speaker identification systems perform almost perfectly in neutral talking environments; however, they perform poorly in shouted talking environments. This work aims at proposing, implementing, and evaluating novel models called Third-Order Hidden Markov Models (HMM3s) to enhance the poor performance of text-independent speaker identification systems in shouted talking environments. The proposed models have been evaluated on our collected speech database using Mel-Frequency Cepstral Coefficients (MFCCs). Our results show that HMM3s significantly improve speaker identification performance in shouted talking environments compared to second-order hidden Markov models (HMM2s) and first-order hidden Markov models (HMM1s) by 12.4% and 202.4%, respectively. The achieved results based on the proposed models are close to those obtained in subjective assessment by human listeners.

Ismail Shahin | I. Shahin

[1] Jean-François Mari,et al. A second-order HMM for high performance word and phoneme-based continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2] H. Messer,et al. High-order Hidden Markov Models - estimation and implementation , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[3] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[4] Ismail Shahin,et al. Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models , 2006, Speech Commun..

[5] William M. Campbell,et al. Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[6] John H. L. Hansen,et al. Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[7] K E Cummings,et al. Analysis of the glottal excitation of emotionally styled and stressed speech. , 1995, The Journal of the Acoustical Society of America.

[8] Tomi Kinnunen,et al. Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Yeunung Chen,et al. Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[10] Ismail Shahin,et al. Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[11] Sotirios Chatzis,et al. Margin-maximizing classification of sequential data with infinitely-long temporal dependencies , 2013, Expert Syst. Appl..

[12] Alessandra Russo,et al. Multistyle classification of speech under stress using feature subset selection based on genetic algorithms , 2007, Speech Commun..

[13] Sadaoki Furui. Speaker-dependent-feature extraction, recognition and processing techniques , 1991, Speech Commun..

[14] Jianing Dai,et al. Isolated word recognition using Markov chain models , 1995, IEEE Trans. Speech Audio Process..

[15] John H. L. Hansen,et al. Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[16] John H. L. Hansen,et al. A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[17] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18] John H. L. Hansen,et al. Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[19] J. Oglesby,et al. Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[20] Ismail Shahin. Improving Speaker Identification Performance Under the Shouted Talking Condition Using the Second-Order Hidden Markov Models , 2005, EURASIP J. Adv. Signal Process..

[21] S. Dandapat,et al. Speaker recognition under stressed condition , 2010, Int. J. Speech Technol..

[22] Sam Kwong,et al. A genetic classification method for speaker recognition , 2005, Eng. Appl. Artif. Intell..

[23] Abdelaziz Kriouile,et al. Automatic word recognition based on second-order hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[24] Ismail Shahin. Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments , 2010, EURASIP J. Audio Speech Music. Process..

[25] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[26] John H. L. Hansen,et al. Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[27] William M. Campbell,et al. Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[28] A. Routray,et al. Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[29] Richard J. Mammone,et al. Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[30] Tiago H. Falk,et al. Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.