Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition
暂无分享,去创建一个
Jianhua Tao | Zheng Lian | Bin Liu | Jian Huang | J. Tao | Bin Liu | Jian Huang | Zheng Lian
[1] Seyedmahdad Mirsamadi,et al. Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Wei-Qiang Zhang,et al. Towards Discriminative Representations and Unbiased Predictions: Class-Specific Angular Softmax for Speech Emotion Recognition , 2019, INTERSPEECH.
[3] Björn W. Schuller,et al. Categorical and dimensional affect analysis in continuous input: Current trends and future directions , 2013, Image Vis. Comput..
[4] Fabien Ringeval,et al. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.
[5] Ngoc Thang Vu,et al. Attentive Convolutional Neural Network Based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech , 2017, INTERSPEECH.
[6] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Tieniu Tan,et al. Affective Computing: A Review , 2005, ACII.
[8] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.
[11] Ya Li,et al. Long short term memory recurrent neural network based encoding method for emotion recognition in video , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Jian Huang,et al. Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function , 2018, INTERSPEECH.
[13] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[14] Dong Yu,et al. Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.
[15] Yuexian Zou,et al. Discriminative Feature Learning for Speech Emotion Recognition , 2019, ICANN.
[16] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[17] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.
[18] Efthymios Tzinis,et al. Segment-based speech emotion recognition using recurrent neural networks , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).
[19] Margaret Lech,et al. Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.
[20] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Gwen Littlewort,et al. Multiple kernel learning for emotion recognition in the wild , 2013, ICMI '13.
[22] Zhong-Qiu Wang,et al. Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).