Comparison of speaker dependent and speaker independent emotion recognition

Abstract This paper describes a study of emotion recognition based on speech analysis. The introduction to the theory contains a review of emotion inventories used in various studies of emotion recognition as well as the speech corpora applied, methods of speech parametrization, and the most commonly employed classification algorithms. In the current study the EMO-DB speech corpus and three selected classifiers, the k-Nearest Neighbor (k-NN), the Artificial Neural Network (ANN) and Support Vector Machines (SVMs), were used in experiments. SVMs turned out to provide the best classification accuracy of 75.44% in the speaker dependent mode, that is, when speech samples from the same speaker were included in the training corpus. Various speaker dependent and speaker independent configurations were analyzed and compared. Emotion recognition in speaker dependent conditions usually yielded higher accuracy results than a similar but speaker independent configuration. The improvement was especially well observed if the base recognition ratio of a given speaker was low. Happiness and anger, as well as boredom and neutrality, proved to be the pairs of emotions most often confused.

[1]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[2]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[3]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[4]  Steven J. Simske,et al.  Recognition of emotions in interactive voice response systems , 2003, INTERSPEECH.

[5]  Daniel Neiberg,et al.  Classification of Affective Speech using Normalized Time-Frequency Cepstra , 2010 .

[6]  Levent M. Arslan,et al.  Automatic Detection of Anger in Human-Human Call Center Dialogs , 2011, INTERSPEECH.

[7]  A. Janicki,et al.  Recognition of Extraversion Level Based on Handwriting and Support Vector Machines , 2012, Perceptual and motor skills.

[8]  Björn W. Schuller,et al.  Patterns, prototypes, performance: classifying emotional user states , 2008, INTERSPEECH.

[9]  Laurence Devillers,et al.  Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs , 2006, INTERSPEECH.

[10]  Zdzislaw Kowalczuk,et al.  Intelligent decision-making system for autonomous robots , 2011, Int. J. Appl. Math. Comput. Sci..

[11]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[12]  A. Mehrabian,et al.  Decoding of inconsistent communications. , 1967, Journal of personality and social psychology.

[13]  Krzysztof Patan,et al.  Nonlinear model predictive control of a boiler unit: A fault tolerant control study , 2010 .

[14]  Adam Pelikant,et al.  Recognition of Human Emotion from a Speech Signal Based on Plutchik's Model , 2012 .

[15]  Artur Janicki On the Impact of Non-speech Sounds on Speaker Recognition , 2012, TSD.

[16]  Rok Gajsek,et al.  Speaker state recognition using an HMM-based feature extraction method , 2013, Comput. Speech Lang..

[17]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[18]  Julia Hirschberg,et al.  Detecting certainness in spoken tutorial dialogues , 2005, INTERSPEECH.

[19]  Krzysztof Patan,et al.  Nonlinear model predictive control of a boiler unit: A fault tolerant control study , 2010, 2010 Conference on Control and Fault-Tolerant Systems (SysTol).

[20]  Zhongzhe Xiao,et al.  Two-stage Classification of Emotional Speech , 2006, International Conference on Digital Telecommunications (ICDT'06).

[21]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Nicholas B. Allen,et al.  Recognition of stress in speech using wavelet analysis and Teager energy operator , 2008, INTERSPEECH.

[23]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[24]  Adam Krzyzak,et al.  Classification of Breast Cancer Malignancy Using Cytological Images of Fine Needle Aspiration Biopsies , 2008, Int. J. Appl. Math. Comput. Sci..

[25]  Yun Lei,et al.  A novel feature extraction strategy for multi-stream robust emotion identification , 2010, INTERSPEECH.

[26]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[27]  Shrikanth S. Narayanan,et al.  Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[28]  Chloé Clavel,et al.  Detection and Analysis of Abnormal Situations Through Fear-Type Acoustic Manifestations , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Fakhri Karray,et al.  Speech Emotion Recognition using Gaussian Mixture Vector Autoregressive Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[30]  Robert I. Damper,et al.  Multi-class and hierarchical SVMs for emotion recognition , 2010, INTERSPEECH.

[31]  Allison Woodruff,et al.  Detecting user engagement in everyday conversations , 2004, INTERSPEECH.

[32]  Chungyong Lee,et al.  Speaker dependent emotion recognition using speech signals , 2000, INTERSPEECH.

[33]  Björn W. Schuller,et al.  Recognition of interest in human conversational speech , 2006, INTERSPEECH.

[34]  Bin Yang,et al.  Robust Estimation of Voice Quality Parameters Under Realworld Disturbances , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[35]  Elmar Nöth,et al.  Tales of tuning - prototyping for automatic classification of emotional user states , 2005, INTERSPEECH.

[36]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[37]  Andreas Stolcke,et al.  Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.

[38]  Paul Dalsgaard,et al.  Design, recording and verification of a danish emotional speech database , 1997, EUROSPEECH.

[39]  Krzysztof Slot,et al.  Emotion recognition in speech signal using emotion-extracting binary decision trees , 2007 .