暂无分享,去创建一个
Cordelia Schmid | Ivan Laptev | Shizhe Chen | Makarand Tapaswi | Pierre-Louis Guhur | C. Schmid | I. Laptev | Makarand Tapaswi | Pierre-Louis Guhur | Shizhe Chen
[1] Xiaojun Chang,et al. Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yuankai Qi,et al. A Recurrent Vision-and-Language BERT for Navigation , 2020, ArXiv.
[3] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[5] Chunhua Shen,et al. REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).
[7] Licheng Yu,et al. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.
[8] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[9] Arjun Majumdar,et al. Improving Vision-and-Language Navigation with Image-Text Pairs from the Web , 2020, ECCV.
[10] Ghassan Al-Regib,et al. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation , 2019, ICLR.
[11] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[12] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[13] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[15] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[16] Hal Daumé,et al. Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning , 2019, EMNLP.
[17] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[18] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[19] Khanh Nguyen,et al. Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[21] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.
[22] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Stephen Gould,et al. Sub-Instruction Aware Vision-and-Language Navigation , 2020, EMNLP.
[25] Hao Tan,et al. Diagnosing the Environment Bias in Vision-and-Language Navigation , 2020, IJCAI.
[26] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[27] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Mark Hopkins,et al. Extending a Parser to Distant Domains Using a Few Dozen Partially Annotated Examples , 2018, ACL.
[29] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Xin Wang,et al. Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation , 2018, ECCV.
[31] Jason Baldridge,et al. Transferable Representation Learning in Vision-and-Language Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Chunhua Shen,et al. Soft Expert Reward Learning for Vision-and-Language Navigation , 2020, ECCV.
[33] Ghassan Al-Regib,et al. The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[35] David Reitter,et al. Fusion of Detected Objects in Text for Visual Question Answering , 2019, EMNLP.
[36] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[37] Jesse Thomason,et al. Vision-and-Dialog Navigation , 2019, CoRL.
[38] Jianfeng Gao,et al. Robust Navigation with Language Pretraining and Stochastic Sampling , 2019, EMNLP.
[39] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[40] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Marcus Rohrbach,et al. 12-in-1: Multi-Task Vision and Language Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[45] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[46] Anton van den Hengel,et al. Object-and-Action Aware Model for Visual Language Navigation , 2020, ECCV.
[47] Jianfeng Gao,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Jan Kautz,et al. Contrastive Learning for Weakly Supervised Phrase Grounding , 2020, ECCV.
[49] Jacob Krantz,et al. Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.