暂无分享,去创建一个
Guofa Li | Dongpu Cao | Lang Su | Chuqing Hu
[1] Dinesh Manocha,et al. M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues , 2020, AAAI.
[2] Yingyu Liang,et al. Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis , 2019, AAAI.
[3] Ning Xu,et al. Learn to Combine Modalities in Multimodal Deep Learning , 2018, ArXiv.
[4] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Wei Liu,et al. Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis , 2019, ArXiv.
[6] Yoni Bauduin,et al. Audio-Visual Speech Recognition , 2004 .
[7] Erik Cambria,et al. Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.
[8] Rita Noumeir,et al. Infrared and 3D Skeleton Feature Fusion for RGB-D Action Recognition , 2020, IEEE Access.
[9] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[10] Frédéric Jurie,et al. CentralNet: a Multilayer Approach for Multimodal Fusion , 2018, ECCV Workshops.
[11] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[12] Jennifer Williams,et al. DNN Multimodal Fusion Techniques for Predicting Video Sentiment , 2018 .
[13] Guofa Li,et al. A Spontaneous Driver Emotion Facial Expression (DEFE) Dataset for Intelligent Vehicles: Emotions Triggered by Video-Audio Clips in Driving Scenarios , 2020, IEEE Transactions on Affective Computing.
[14] Eric Granger,et al. Multimodal Fusion with Deep Neural Networks for Audio-Video Emotion Recognition , 2019, ArXiv.
[15] Feiping Nie,et al. Dense Multimodal Fusion for Hierarchically Joint Representation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[17] Stephen J. Maybank,et al. Feedback Graph Convolutional Network for Skeleton-Based Action Recognition , 2020, IEEE Transactions on Image Processing.
[18] Christian Wolf,et al. Sequential Deep Learning for Human Action Recognition , 2011, HBU.
[19] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Chao Li,et al. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation , 2018, IJCAI.
[21] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Erik Cambria,et al. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph , 2018, ACL.
[23] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Roger Zimmermann,et al. MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis , 2020, ACM Multimedia.
[25] Guangming Shi,et al. SGM-Net: Skeleton-guided multimodal network for action recognition , 2020, Pattern Recognit..
[26] Chongruo Wu,et al. ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Frédéric Jurie,et al. MFAS: Multimodal Fusion Architecture Search , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Chirag N. Paunwala,et al. Improved weight assignment approach for multimodal fusion , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).
[29] Jean Maillard,et al. Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.
[30] Jianyou Wang,et al. Speech Emotion Recognition with Dual-Sequence LSTM Architecture , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Minetada Osano,et al. Towards recognizing emotion with affective dimensions through body gestures , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).
[32] Louis-Philippe Morency,et al. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos , 2016, ArXiv.
[33] Stéphane Ayache,et al. Majority Vote of Diverse Classifiers for Late Fusion , 2014, S+SSPR.
[34] A. Murat Tekalp,et al. Multimodal Speaker Identification Using Canonical Correlation Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[35] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Christian Wolf,et al. ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..
[37] Sridha Sridharan,et al. Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition , 2018, Comput. Vis. Image Underst..
[38] Songlong Xing,et al. Locally Confined Modality Fusion Network With a Global Perspective for Multimodal Human Affective Computing , 2020, IEEE Transactions on Multimedia.
[39] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Jong-Seok Lee,et al. EmbraceNet: A robust deep learning architecture for multimodal classification , 2019, Inf. Fusion.
[41] Junsong Yuan,et al. Recognizing Human Actions as the Evolution of Pose Estimation Maps , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Chengxin Li,et al. Speech emotion recognition with acoustic and lexical features , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Daniel Roggen,et al. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.
[44] Bao-Liang Lu,et al. Multimodal emotion recognition using EEG and eye tracking data , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.
[45] John Kane,et al. COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Amirreza Shaban,et al. MMTM: Multimodal Transfer Module for CNN Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Zhongkai Sun,et al. Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis , 2019, INTERSPEECH.
[49] Ruslan Salakhutdinov,et al. Learning Factorized Multimodal Representations , 2018, ICLR.
[50] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.
[51] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.
[52] Christian Wolf,et al. Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.