Speech Emotion Recognition Based on Multi-Task Learning

The complexity of emotion generation, expression, and data annotation make emotion recognition very challenging. As a kind of transfer learning, multi-task learning can aggregate multiple related corpora to achieve data sharing, and achieve the feature level sharing by utilizing the correlation of tasks, improving the training efficiency and accuracy. In this paper, we investigate the application of multi-task learning in the field of speech emotion recognition, including the model analysis, the database selection and the feature extraction. And the key research points of the research are proposed.

[1]  Reza Lotfian,et al.  Predicting Categorical Emotions by Jointly Learning Primary and Secondary Emotions through Multitask Learning , 2018, INTERSPEECH.

[2]  Igor Bisio,et al.  Gender-Driven Emotion Recognition Through Speech Signals For Ambient Intelligence Applications , 2013, IEEE Transactions on Emerging Topics in Computing.

[3]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Meikang Qiu,et al.  Reinforcement Learning-based Content-Centric Services in Mobile Sensing , 2018, IEEE Network.

[5]  Gwenn Englebienne,et al.  Towards Speech Emotion Recognition "in the Wild" Using Aggregated Corpora and Deep Multi-Task Learning , 2017, INTERSPEECH.

[6]  George Trigeorgis,et al.  End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[7]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Keke Gai,et al.  Energy-aware task assignment for mobile cyber-enabled applications in heterogeneous cloud computing , 2018, J. Parallel Distributed Comput..

[9]  Björn W. Schuller,et al.  Multi-task deep neural network with shared hidden layers: Breaking down the wall between emotion representations , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[11]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[12]  Carlos Busso,et al.  Jointly Predicting Arousal, Valence and Dominance with Multi-Task Learning , 2017, INTERSPEECH.

[13]  Carlos Busso,et al.  Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings , 2019, IEEE Transactions on Affective Computing.

[14]  Carolyn Korsmeyer Rosalind W. Picard, Affective Computing , 2004, Minds and Machines.

[15]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[16]  Tieniu Tan,et al.  Affective Computing: A Review , 2005, ACII.

[17]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[18]  Rui Xia,et al.  Leveraging valence and activation information via multi-task learning for categorical emotion recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).