Speaker Modeling Using Emotional Speech for More Robust Speaker Identification

Automatic identity recognition in fast, reliable and non-intrusive way is one of the most challenging topics in digital world of today. A possible approach to identity recognition is the identification by voice. Characteristics of speech relevant for automatic speaker recognition can be affected by external factors such as noise and channel distortions, but also by speaker-specific conditions—emotional or health states. The improvement of a speaker recognition system by different model training strategies are addressed in this paper in order to obtain the best performance of the system with only a limited amount of neutral and emotional speech data. The models adopted are a Gaussian Mixture Model and i-vectors whose inputs are Mel Frequency Cepstral Coefficients, and the experiments have been conducted on the Russian Language Affective speech database. The results show that the appropriate use of emotional speech in speaker model training improves the robustness of a speaker recognition system – both when tested on neutral and emotional speech.

[1]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[2]  Yahui Yu,et al.  The Exponential Diophantine Equation 2x + b y = c z , 2014, TheScientificWorldJournal.

[3]  Jaakko Astola,et al.  A study of the effect of emotional state upon text-independent speaker identification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Zied Lachiri,et al.  SVM based Emotional Speaker Recognition using MFCC-SDC Features , 2017 .

[5]  Zhaohui Wu,et al.  Improving Speaker Recognition by Training on Emotion-Added Models , 2005, ACII.

[6]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[7]  Li Chen,et al.  Applying Emotional Factor Analysis and I-Vector to Emotional Speaker Recognition , 2011, CCBR.

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  J. Russell,et al.  The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology , 2005, Development and Psychopathology.

[10]  Adrian Leemann,et al.  Speaker idiosyncratic rhythmic features in the speech signal , 2012, INTERSPEECH.

[11]  Vlado Delic,et al.  Application of dimensional emotion model in automatic emotional speech recognition , 2013, 2013 IEEE 11th International Symposium on Intelligent Systems and Informatics (SISY).

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  Weihui Dai,et al.  Cost-Sensitive Learning for Emotion Robust Speaker Recognition , 2014, TheScientificWorldJournal.

[14]  Li Chen,et al.  Emotional Speaker Recognition Based on Model Space Migration through Translated Learning , 2013, CCBR.

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[17]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[18]  et al.,et al.  The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016 , 2017, INTERSPEECH.

[19]  Zoran Peric,et al.  Automatic Speaker Recognition Dependency on Both the Shape of Auditory Critical Bands and Speaker Discriminative MFCCs , 2015 .

[20]  Suryakanth V. Gangashetty,et al.  A Study on Text-Independent Speaker Recognition Systems in Emotional Conditions Using Different Pattern Recognition Models , 2016, MIKE.

[21]  Zhaohui Wu,et al.  MASC: A Speech Corpus in Mandarin for Emotion Analysis and Affective Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[22]  A. Alarifi,et al.  SVM based Arabic speaker verification system for mobile devices , 2012, 2012 International Conference on Information Technology and e-Services.

[23]  P. Ekman An argument for basic emotions , 1992 .

[24]  Ravi Ramamoorthi,et al.  A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency , 2006, ECCV.