Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation
暂无分享,去创建一个
In So Kweon | Jae Won Cho | Dong-Jin Kim | Jinsoo Choi | Yunjae Jung | I. Kweon | Dong-Jin Kim | Jae-Won Cho | Yunjae Jung | Jinsoo Choi
[1] Vittorio Murino,et al. Modality Distillation with Multiple Stream Networks for Action Recognition , 2018, ECCV.
[2] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[3] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Matthieu Cord,et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Tae-Hyun Oh,et al. Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[8] Tae-Hyun Oh,et al. Disjoint Multi-task Learning Between Heterogeneous Human-Centric Tasks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[9] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[11] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Kristen Grauman,et al. CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question , 2017, CHI.
[14] Trevor Darrell,et al. Fooling Vision and Language Models Despite Localization and Attention Mechanism , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15] Tae-Hyun Oh,et al. Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach , 2019, EMNLP/IJCNLP.
[16] Qing Li,et al. Why Does a Visual Question Have Different Answers? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Trevor Darrell,et al. Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.
[19] Jiebo Luo,et al. VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[21] Cordelia Schmid,et al. MARS: Motion-Augmented RGB Stream for Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[24] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Bernhard Schölkopf,et al. Unifying distillation and privileged information , 2015, ICLR.
[26] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Zhou Yu,et al. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[28] Vladimir Vapnik,et al. A new learning paradigm: Learning using privileged information , 2009, Neural Networks.
[29] Vittorio Murino,et al. Distillation Multiple Choice Learning for Multimodal Action Recognition , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[30] In So Kweon,et al. Detecting Human-Object Interactions with Action Co-occurrence Priors , 2020, ECCV.
[31] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Juan Carlos Niebles,et al. Graph Distillation for Action Detection with Privileged Modalities , 2017, ECCV.
[33] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[34] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[35] Jitendra Malik,et al. Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).