暂无分享,去创建一个
Yuankai Qi | Sugato Basu | Pradyumna Narayana | Miguel Eckstein | Qi Wu | Kazoo Sone | Xin Eric Wang | William Yang Wang | Wanrong Zhu
[1] William Yang Wang,et al. Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation , 2020, FINDINGS.
[2] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[3] Christopher D. Manning,et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.
[4] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Yuankai Qi,et al. Language and Visual Entity Relationship Graph for Agent Navigation , 2020, NeurIPS.
[7] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[8] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[9] Jianfeng Gao,et al. Robust Navigation with Language Pretraining and Stochastic Sampling , 2019, EMNLP.
[10] Ashish Vaswani,et al. Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation , 2019, ACL.
[11] Yejin Choi,et al. Multi-View Learning for Vision-and-Language Navigation , 2020, ArXiv.
[12] Chunhua Shen,et al. REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[14] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[15] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Jianfeng Gao,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).
[18] Licheng Yu,et al. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.
[19] Tao Mei,et al. Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions , 2018, EMNLP.
[20] Ghassan Al-Regib,et al. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation , 2019, ICLR.
[21] Hal Daumé,et al. Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning , 2019, EMNLP.
[22] Xiaojun Chang,et al. Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Siddhartha S. Srinivasa,et al. Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Jason Baldridge,et al. Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View , 2020, ArXiv.
[25] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.
[26] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[28] Trevor Darrell,et al. Multimodal Explanations: Justifying Decisions and Pointing to the Evidence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Nan Duan,et al. UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation , 2020, ArXiv.
[31] Jianlong Fu,et al. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers , 2020, ArXiv.
[32] Raymond J. Mooney,et al. Faithful Multimodal Explanation for Visual Question Answering , 2018, BlackboxNLP@ACL.
[33] Jason Baldridge,et al. Transferable Representation Learning in Vision-and-Language Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Chunhua Shen,et al. Soft Expert Reward Learning for Vision-and-Language Navigation , 2020, ECCV.
[35] Zornitsa Kozareva,et al. Environment-agnostic Multitask Learning for Natural Language Grounded Navigation , 2020, ECCV.
[36] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Raia Hadsell,et al. Learning to Navigate in Cities Without a Map , 2018, NeurIPS.
[38] Khanh Nguyen,et al. Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Trevor Darrell,et al. Generating Visual Explanations , 2016, ECCV.
[40] Anton van den Hengel,et al. Object-and-Action Aware Model for Visual Language Navigation , 2020, ECCV.
[41] Ghassan Al-Regib,et al. The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Xin Wang,et al. Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation , 2020, EACL.
[43] Xin Wang,et al. Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation , 2018, ECCV.
[44] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).