Combining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognition
暂无分享,去创建一个
Min Chen | Yiling Wu | Sheng Zhang | Yuan-Fang Li | Minglei Li | Jincai Chen | Chuanbo Zhu | Min Chen | Yuan-Fang Li | Jincai Chen | Sheng Zhang | Minglei Li | Chuanbo Zhu | Yiling Wu
[1] Erik Cambria,et al. ABCDM: An Attention-based Bidirectional CNN-RNN Deep Model for sentiment analysis , 2021, Future Gener. Comput. Syst..
[2] Shan Li,et al. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.
[3] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[5] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.
[6] Qin Jin,et al. Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching , 2020, ACM Multimedia.
[7] J. Russell. A circumplex model of affect. , 1980 .
[8] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.
[9] Yong Zhou,et al. iQIYI Celebrity Video Identification Challenge , 2019, ACM Multimedia.
[10] Xiaofeng Liu,et al. Image2Audio: Facilitating Semi-supervised Audio Emotion Recognition with Facial Expression Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[11] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[12] Björn W. Schuller,et al. An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech , 2017, ACM Multimedia.
[13] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[15] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.
[16] Jeesun Kim,et al. Prosody off the top of the head: Prosodic contrasts can be discriminated by head motion , 2010, Speech Commun..
[17] Erik Cambria,et al. SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis , 2020, CIKM.
[18] Erik Cambria,et al. Computational Intelligence for Affective Computing and Sentiment Analysis [Guest Editorial] , 2019, IEEE Comput. Intell. Mag..
[19] Yixue Hao,et al. Label-less Learning for Emotion Cognition , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[20] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[21] Ioannis Pitas,et al. The eNTERFACE05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).
[22] Erik Cambria,et al. Context-Dependent Sentiment Analysis in User-Generated Videos , 2017, ACL.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Ron Hoory,et al. Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms , 2017, INTERSPEECH.
[25] Kenji Fukumizu,et al. Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.
[26] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.
[27] Louis-Philippe Morency,et al. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos , 2016, ArXiv.
[28] Min Chen,et al. Label-less Learning for Traffic Control in an Edge Network , 2018, IEEE Network.
[29] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.
[30] Nadra Guizani,et al. Living with I-Fabric: Smart Living Powered by Intelligent Fabric and Deep Analytics , 2020, IEEE Network.
[31] Erik Cambria,et al. Fuzzy commonsense reasoning for multimodal sentiment analysis , 2019, Pattern Recognit. Lett..
[32] Kaicheng Yang,et al. CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality , 2020, ACL.
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Erik Cambria,et al. Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.
[35] Erik Cambria,et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.
[36] Fakhri Karray,et al. Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..
[37] Asif Ekbal,et al. How Intense Are You? Predicting Intensities of Emotions and Sentiments using Stacked Ensemble [Application Notes] , 2020, IEEE Comput. Intell. Mag..
[38] Erik Cambria,et al. Sentiment Analysis and Topic Recognition in Video Transcriptions , 2021, IEEE Intelligent Systems.
[39] Hui Zhang,et al. Learning Alignment for Multimodal Emotion Recognition from Speech , 2019, INTERSPEECH.
[40] John Kane,et al. COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Simon Rigoulot,et al. Emotion in the voice influences the way we scan emotional faces , 2014, Speech Commun..
[42] Erik Cambria,et al. Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.
[43] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.
[44] Tapani Raiko,et al. Semi-supervised Learning with Ladder Networks , 2015, NIPS.
[45] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[46] David Berthelot,et al. ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring , 2020, ICLR.
[47] Kyomin Jung,et al. Multimodal Speech Emotion Recognition Using Audio and Text , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).