Speech Emotion Recognition via Attention-based DNN from Multi-Task Learning

Speech unlocks the huge potentials in emotion recognition. High accurate and real-time understanding of human emotion via speech assists Human-Computer Interaction. Previous works are often limited in either coarse-grained emotion learning tasks or the low precisions on the emotion recognition. To solve these problems, we construct a real-world large-scale corpus composed of 4 common emotions (i.e., anger, happiness, neutral and sadness). We also propose a multi-task attention-based DNN model (i.e., MT-A-DNN) on the emotion learning. MT-A-DNN efficiently learns the high-order dependency and non-linear correlations underlying in the audio data. Extensive experiments show that MT-A-DNN outperforms conventional methods on the emotion recognition. It could take one step further on the real-time acoustic emotion recognition in many smart audio-devices.

[1]  Gang Liu,et al.  An Ensemble Framework of Voice-Based Emotion Recognition System for Films and TV Programs , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Weixi Gu PhD Forum Abstract: Non-intrusive Blood Glucose Monitor by Multi-task Deep Learning , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[3]  Han Zou,et al.  BikeMate: Bike Riding Behavior Monitoring with Smartphones , 2017, MobiQuitous.

[4]  Yang Liu,et al.  A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space , 2017, IEEE Transactions on Affective Computing.

[5]  Weixi Gu,et al.  PhD Forum Abstract: Non-intrusive Blood Glucose Monitor by Multi-task Deep Learning , 2017, 2017 16th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[6]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[7]  Han Zou,et al.  SugarMate: Non-intrusive Blood Glucose Monitoring with Smartphones , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[8]  Theodoros Giannakopoulos pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[9]  Xuetian Wang,et al.  Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model , 2017 .