Wavelet-Based Time-Frequency Representations for Automatic Recognition of Emotions from Speech

The interest in emotion recognition from speech has increased in the last decade. Emotion recognition can improve the quality of services and the quality of life of people. One of the main problems in emotion recognition from speech is to find suitable features to represent the phenomenon. This paper proposes new features based on the energy content of wavelet based time-frequency (TF) representations to model emotional speech. Three TF representations are considered: (1) the continuous wavelet transform, (2) the bionic wavelet transform, and (3) the synchro–squeezed wavelet transform. The classification is performed using GMM supervectors. Different classification problems are addressed, including high vs. low arousal, positive vs. negative valence, and multiple emotions. The results indicate that the proposed features are useful to classify high vs. low arousal emotions, and that the features derived from the synchro–squeezed wavelet transform are more suitable than the other two approaches to model emotional speech.

[1]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[2]  Ingrid Daubechies,et al.  A Nonlinear Squeezing of the Continuous Wavelet Transform Based on Auditory Nerve Models , 2017 .

[3]  Wenming Zheng,et al.  A Novel Speech Emotion Recognition Method via Incomplete Sparse Least Square Regression , 2014, IEEE Signal Processing Letters.

[4]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[5]  Jing Zhang,et al.  Study of wavelet packet energy entropy for emotion classification in speech and glottal signals , 2013, Other Conferences.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[8]  Sazali Yaacob,et al.  Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals , 2015, PloS one.

[9]  P. Jackson,et al.  Multimodal Emotion Recognition , 2010 .

[10]  Chloé Clavel,et al.  Fear-type emotion recognition for future audio-based surveillance systems , 2008, Speech Commun..

[11]  Xiaohui Yuan,et al.  Adaptive wavelet shrinkage for noise robust speaker recognition , 2014, Digit. Signal Process..

[12]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[13]  Hau-Tieng Wu,et al.  The Synchrosqueezing algorithm for time-varying spectral analysis: Robustness properties and new paleoclimate applications , 2011, Signal Process..

[14]  Jesús Francisco Vargas-Bonilla,et al.  Emotion recognition from speech under environmental noise conditions using wavelet decomposition , 2015, 2015 International Carnahan Conference on Security Technology (ICCST).

[15]  I. Daubechies,et al.  Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool , 2011 .

[16]  Jesús Francisco Vargas-Bonilla,et al.  Low-frequency components analysis in running speech for the automatic detection of parkinson's disease , 2015, INTERSPEECH.

[17]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[18]  Yuan-Ting Zhang,et al.  Bionic wavelet transform: a new time-frequency method based on an auditory model , 2001, IEEE Trans. Biomed. Eng..

[19]  Florian Eyben,et al.  Towards a standard set of acoustic features for the processing of emotion in speech. , 2010 .

[20]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[21]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[22]  Jesús Francisco Vargas-Bonilla,et al.  Non-linear Dynamics Characterization from Wavelet Packet Transform for Automatic Recognition of Emotional Speech , 2016, Recent Advances in Nonlinear Speech Processing.

[23]  Björn Schuller,et al.  Emotion Recognition in Naturalistic Speech and Language—A Survey , 2015 .

[24]  Carlos Busso,et al.  Emotion recognition using a hierarchical binary decision tree approach , 2011, Speech Commun..