Connective Cognition Network for Directional Visual Commonsense Reasoning
暂无分享,去创建一个
Yi Yang | Yahong Han | Aming Wu | Linchao Zhu | Yezhou Yang | Yahong Han | Linchao Zhu | Aming Wu
[1] Bernhard A. Sabel,et al. Dynamic reorganization of brain functional networks during cognition , 2015, NeuroImage.
[2] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[5] Vimla L. Patel,et al. Cognitive models of directional inference in expert medical reasoning , 1997 .
[6] Mo Yu,et al. Exploiting Rich Syntactic Information for Semantic Parsing with Graph-to-Sequence Model , 2018, EMNLP.
[7] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Zhiyuan Liu,et al. Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.
[9] Gim Hee Lee,et al. PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Carl Doersch,et al. Learning Visual Question Answering by Bootstrapping Hard Attention , 2018, ECCV.
[11] Matthieu Cord,et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[13] Xing Zhang,et al. Non-local NetVLAD Encoding for Video Classification , 2018, ECCV Workshops.
[14] Shuicheng Yan,et al. Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[16] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[17] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[18] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[19] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Mitesh M. Khapra,et al. Efficient Video Classification Using Fewer Frames , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Svetlana Lazebnik,et al. Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering , 2018, NeurIPS.
[22] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Jonathan Masci,et al. Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Peng Gao,et al. Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Ivan Laptev,et al. Learnable pooling with Context Gating for video classification , 2017, ArXiv.
[26] Tamir Hazan,et al. Factor Graph Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Karl J. Friston,et al. Structural and Functional Brain Networks: From Connections to Cognition , 2013, Science.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[31] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[33] Sarah Parisot,et al. Learning Conditioned Graph Structures for Interpretable Visual Question Answering , 2018, NeurIPS.
[34] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[36] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[37] Allan Jabri,et al. Revisiting Visual Question Answering Baselines , 2016, ECCV.
[38] Byoung-Tak Zhang,et al. Multimodal Dual Attention Memory for Video Story Question Answering , 2018, ECCV.
[39] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[41] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.