Novel third-order hidden Markov models for speaker identification in shouted talking environments

Speaker identification systems perform almost perfectly in neutral talking environments; however, they perform poorly in shouted talking environments. This work aims at proposing, implementing, and evaluating novel models called Third-Order Hidden Markov Models (HMM3s) to enhance the poor performance of text-independent speaker identification systems in shouted talking environments. The proposed models have been evaluated on our collected speech database using Mel-Frequency Cepstral Coefficients (MFCCs). Our results show that HMM3s significantly improve speaker identification performance in shouted talking environments compared to second-order hidden Markov models (HMM2s) and first-order hidden Markov models (HMM1s) by 12.4% and 202.4%, respectively. The achieved results based on the proposed models are close to those obtained in subjective assessment by human listeners.

[1]  Jean-François Mari,et al.  A second-order HMM for high performance word and phoneme-based continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  H. Messer,et al.  High-order Hidden Markov Models - estimation and implementation , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[3]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[4]  Ismail Shahin,et al.  Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models , 2006, Speech Commun..

[5]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[6]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[7]  K E Cummings,et al.  Analysis of the glottal excitation of emotionally styled and stressed speech. , 1995, The Journal of the Acoustical Society of America.

[8]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yeunung Chen,et al.  Cepstral domain talker stress compensation for robust speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[10]  Ismail Shahin,et al.  Speaker identification in the shouted environment using Suprasegmental Hidden Markov Models , 2008, Signal Process..

[11]  Sotirios Chatzis,et al.  Margin-maximizing classification of sequential data with infinitely-long temporal dependencies , 2013, Expert Syst. Appl..

[12]  Alessandra Russo,et al.  Multistyle classification of speech under stress using feature subset selection based on genetic algorithms , 2007, Speech Commun..

[13]  Sadaoki Furui Speaker-dependent-feature extraction, recognition and processing techniques , 1991, Speech Commun..

[14]  Jianing Dai,et al.  Isolated word recognition using Markov chain models , 1995, IEEE Trans. Speech Audio Process..

[15]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[16]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[19]  J. Oglesby,et al.  Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[20]  Ismail Shahin Improving Speaker Identification Performance Under the Shouted Talking Condition Using the Second-Order Hidden Markov Models , 2005, EURASIP J. Adv. Signal Process..

[21]  S. Dandapat,et al.  Speaker recognition under stressed condition , 2010, Int. J. Speech Technol..

[22]  Sam Kwong,et al.  A genetic classification method for speaker recognition , 2005, Eng. Appl. Artif. Intell..

[23]  Abdelaziz Kriouile,et al.  Automatic word recognition based on second-order hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[24]  Ismail Shahin Employing Second-Order Circular Suprasegmental Hidden Markov Models to Enhance Speaker Identification Performance in Shouted Talking Environments , 2010, EURASIP J. Audio Speech Music. Process..

[25]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[26]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[27]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[28]  A. Routray,et al.  Emotion recognition from Assamese speeches using MFCC features and GMM classifier , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[29]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[30]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.