Learning Unseen Modality Interaction
暂无分享,去创建一个
[1] Andrew Zisserman,et al. Zorro: the masked multimodal transformer , 2023, ArXiv.
[2] Yi Huang,et al. Cross-Modal Federated Human Activity Recognition via Modality-Agnostic and Modality-Specific Representation Learning , 2022, AAAI.
[3] Jiantao Zhou,et al. Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities , 2022, SIGIR.
[4] Wen-bing Huang,et al. Multimodal Token Fusion for Vision Transformers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Xi Peng,et al. Are Multimodal Transformers Robust to Missing Modality? , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] L. V. D. Maaten,et al. Omnivore: A Single Model for Many Visual Modalities , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] James R. Glass,et al. Everything at Once – Multi-modal Fusion Transformer for Video Retrieval , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jian Ma,et al. Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 , 2021, Int. J. Comput. Vis..
[9] C. Schmid,et al. Attention Bottlenecks for Multimodal Fusion , 2021, NeurIPS.
[10] Stephen Lin,et al. Video Swin Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Yu Wu,et al. Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Shih-Fu Chang,et al. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text , 2021, NeurIPS.
[13] James R. Glass,et al. AST: Audio Spectrogram Transformer , 2021, Interspeech.
[14] Nuno Vasconcelos,et al. Robust Audio-Visual Instance Discrimination , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Ling Shao,et al. Repetitive Activity Counting by Sight and Sound , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Sergey Tulyakov,et al. SMIL: Multimodal Learning with Severely Missing Modality , 2021, AAAI.
[17] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[18] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[19] Chen Sun,et al. Multi-modal Transformer for Video Retrieval , 2020, ECCV.
[20] Chenliang Xu,et al. Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing , 2020, ECCV.
[21] K. Grauman,et al. SoundSpaces: Audio-Visual Navigation in 3D Environments , 2019, ECCV.
[22] Trevor Darrell,et al. Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Dima Damen,et al. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Yang Liu,et al. Use What You Have: Video retrieval using representations from collaborative experts , 2019, BMVC.
[25] Silvio Savarese,et al. Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks , 2019, IEEE Transactions on Robotics.
[26] Tianjian Chen,et al. Federated Machine Learning: Concept and Applications , 2019 .
[27] Qiang Yang,et al. Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..
[28] Anit Kumar Sahu,et al. Federated Optimization in Heterogeneous Networks , 2018, MLSys.
[29] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Yoshua Bengio,et al. An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.
[31] Ivan Laptev,et al. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data , 2018, ArXiv.
[32] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Sarvar Patel,et al. Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..
[34] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Aurélien Géron,et al. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .
[36] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[37] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[38] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Daniel McDuff,et al. Active Contrastive Learning of Audio-Visual Video Representations , 2021, ICLR.
[41] Qin Jin,et al. Missing Modality Imagination Network for Emotion Recognition with Uncertain Missing Modalities , 2021, ACL.