Speaker Verification in Emotional Talking Environments based on Third-Order Circular Suprasegmental Hidden Markov Model

Speaker verification accuracy in emotional talking environments is not high as it is in neutral ones. This work aims at accepting or rejecting the claimed speaker using his/her voice in emotional environments based on the “Third-Order Circular Suprasegmental Hidden Markov Model (CSPHMM3)” as a classifier. An Emirati-accented (Arabic) speech database with “Mel-Frequency Cepstral Coefficients” as the extracted features has been used to evaluate our work. Our results demonstrate that speaker verification accuracy based on CSPHMM3 is greater than that based on the “state-of-the-art classifiers and models such as Gaussian Mixture Model (GMM), Support Vector Machine (SVM), and Vector Quantization (VQ).”

[1]  Ismail Shahin Improving Speaker Identification Performance Under the Shouted Talking Condition Using the Second-Order Hidden Markov Models , 2005, EURASIP J. Adv. Signal Process..

[2]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[3]  Ismail Shahin Text-Independent Emirati-Accented Speaker Identification in Emotional Talking Environment , 2018, 2018 Fifth HCT Information Technology Trends (ITT).

[4]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[5]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[6]  Ismail Shahin,et al.  Talking condition recognition in stressful and emotional talking environments based on CSPHMM2s , 2015, Int. J. Speech Technol..

[7]  A. Alarifi,et al.  SVM based Arabic speaker verification system for mobile devices , 2012, 2012 International Conference on Information Technology and e-Services.

[8]  Ismail Shahin,et al.  Emirati speaker verification based on HMMls, HMM2s, and HMM3s , 2017, 2016 IEEE 13th International Conference on Signal Processing (ICSP).

[9]  Ismail Shahin,et al.  Emotion Recognition Using Hybrid Gaussian Mixture Model and Deep Neural Network , 2019, IEEE Access.

[10]  Analysis and investigation of emotion identification in biased emotional talking environments , 2011 .

[11]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[12]  Sadaoki Furui Speaker-dependent-feature extraction, recognition and processing techniques , 1991, Speech Commun..

[13]  Ismail Shahin,et al.  Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments , 2013, International Journal of Speech Technology.

[14]  Ismail Shahin,et al.  Novel cascaded Gaussian mixture model-deep neural network classifier for speaker identification in emotional talking environments , 2018, Neural Computing and Applications.

[15]  Muhammad Ghulam,et al.  Arabic Speaker Recognition: Babylon Levantine Subset Case Study , 2010 .

[16]  Ismail Shahin,et al.  Novel third-order hidden Markov models for speaker identification in shouted talking environments , 2014, Eng. Appl. Artif. Intell..

[17]  Ismail Shahin,et al.  Employing Emotion Cues to Verify Speakers in Emotional Talking Environments , 2017, J. Intell. Syst..

[18]  Alex Waibel,et al.  Detecting Emotions in Speech , 1998 .

[19]  Khaled Shaalan,et al.  Speech Recognition Using Deep Neural Networks: A Systematic Review , 2019, IEEE Access.

[20]  Ismail Shahin,et al.  Speaker Identification in a Shouted Talking Environment Based on Novel Third-Order Circular Suprasegmental Hidden Markov Models , 2015, Circuits, Systems, and Signal Processing.

[21]  Ismail Shahin,et al.  Studying and enhancing talking condition recognition in stressful and emotional talking environments based on HMMs, CHMM2s and SPHMMs , 2012, Journal on Multimodal User Interfaces.

[22]  Ismail Shahin,et al.  Speaker identification investigation and analysis in unbiased and biased emotional talking environments , 2012, International Journal of Speech Technology.

[23]  Ismail Shahin,et al.  Emirati-accented speaker identification in each of neutral and shouted talking environments , 2018, Int. J. Speech Technol..

[24]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..