Multi-Layer Content Interaction Through Quaternion Product for Visual Question Answering
暂无分享,去创建一个
Chiori Hori | Sen Su | Songxiang Liu | Kai Shuang | Shijie Geng | Peng Gao | Lei Shi
[1] Peng Gao,et al. Multi-Modality Latent Interaction Network for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[3] Siu Cheung Hui,et al. Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks , 2019, ACL.
[4] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Peng Gao,et al. Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Xiaogang Wang,et al. Question-Guided Hybrid Convolution for Visual Question Answering , 2018, ECCV.
[7] Anoop Cherian,et al. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Ying Zhang,et al. Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition , 2018, INTERSPEECH.
[9] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[10] Dario Pavllo,et al. QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.
[11] Takayuki Okatani,et al. Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Jonathon S. Hare,et al. Learning to Count Objects in Natural Images for Visual Question Answering , 2018, ICLR.
[13] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[14] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[17] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[18] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[19] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[23] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[25] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[29] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.