暂无分享,去创建一个
[1] Anoop Cherian,et al. Audio Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Rui Wang,et al. Deep Audio-visual Learning: A Survey , 2020, International Journal of Automation and Computing.
[5] Ali Farhadi,et al. Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Marie-Francine Moens,et al. Talk2Car: Taking Control of Your Self-Driving Car , 2019, EMNLP.
[7] Zhanxing Zhu,et al. Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.
[8] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[9] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[10] Tamir Hazan,et al. Factor Graph Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Anton van den Hengel,et al. Graph-Structured Representations for Visual Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[14] Silvio Savarese,et al. Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Yu Cheng,et al. Relation-Aware Graph Attention Network for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[17] Stefan Lee,et al. Graph R-CNN for Scene Graph Generation , 2018, ECCV.
[18] Yale Song,et al. TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Qi Wu,et al. Are You Talking to Me? Reasoned Visual Dialog Generation Through Adversarial Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[22] Anoop Cherian,et al. Multimodal Attention for Fusion of Audio and Spatiotemporal Features for Video Description , 2018, CVPR Workshops.
[23] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[24] Doyen Sahoo,et al. Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems , 2019, ACL.
[25] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..
[26] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Ji Zhang,et al. Large-Scale Visual Relationship Understanding , 2018, AAAI.
[28] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[29] Michael S. Bernstein,et al. Scene Graph Prediction with Limited Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[30] Ji Zhang,et al. Relationship Proposal Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Tuomas Virtanen,et al. Clotho: an Audio Captioning Dataset , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Bohyung Han,et al. Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[35] Hugo Larochelle,et al. Modulating early visual processing by language , 2017, NIPS.
[36] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Yu Cheng,et al. Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog , 2019, ACL.
[39] Tamir Hazan,et al. A Simple Baseline for Audio-Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Tao Mei,et al. Exploring Visual Relationship for Image Captioning , 2018, ECCV.
[41] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Jia Deng,et al. Pixels to Graphs by Associative Embedding , 2017, NIPS.
[44] Xiaogang Wang,et al. Question-Guided Hybrid Convolution for Visual Question Answering , 2018, ECCV.
[45] Yun-Nung Chen,et al. Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling , 2019, ArXiv.
[46] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Sarah Parisot,et al. Learning Conditioned Graph Structures for Interpretable Visual Question Answering , 2018, NeurIPS.
[48] Shalini Ghosh,et al. Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention , 2019, ArXiv.
[49] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[50] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[51] Anoop Cherian,et al. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[52] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[53] Matthieu Cord,et al. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection , 2019, AAAI.
[54] Peng Gao,et al. Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.
[56] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[57] Ji Zhang,et al. An Interpretable Model for Scene Graph Generation , 2018, ArXiv.
[58] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[59] Andrew Zisserman,et al. Learning Visual Attributes , 2007, NIPS.
[60] Jaewoo Kang,et al. Self-Attention Graph Pooling , 2019, ICML.
[61] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[62] Zhiyuan Liu,et al. Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.
[63] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.
[65] Yi Yang,et al. Entangled Transformer for Image Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[66] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[67] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[68] Shuqiang Jiang,et al. Know More Say Less: Image Captioning Based on Scene Graphs , 2019, IEEE Transactions on Multimedia.
[69] Michael S. Bernstein,et al. Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[70] Grace Hui Yang,et al. VideoQA: question answering on news video , 2003, MULTIMEDIA '03.
[71] Peter Stone,et al. Improving Grounded Natural Language Understanding through Human-Robot Dialog , 2019, 2019 International Conference on Robotics and Automation (ICRA).