暂无分享,去创建一个
Cordelia Schmid | Ivan Laptev | Shizhe Chen | Pierre-Louis Guhur | C. Schmid | I. Laptev | Pierre-Louis Guhur | Shizhe Chen
[1] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[2] Caiming Xiong,et al. Structured Scene Memory for Vision-Language Navigation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[5] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[7] Cordelia Schmid,et al. Episodic Transformer for Vision-and-Language Navigation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Vladlen Koltun,et al. Semi-parametric Topological Memory for Navigation , 2018, ICLR.
[11] Marcella Cornia,et al. Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation , 2019, ArXiv.
[12] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[13] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[15] Xiaojun Chang,et al. Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Ashish Vaswani,et al. Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation , 2019, ACL.
[17] Hao Tan,et al. Diagnosing the Environment Bias in Vision-and-Language Navigation , 2020, IJCAI.
[18] Yuankai Qi,et al. A Recurrent Vision-and-Language BERT for Navigation , 2020, ArXiv.
[19] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[20] Jianfeng Gao,et al. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Jason Baldridge,et al. General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping , 2019, ViGIL@NeurIPS.
[23] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[24] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[25] Jacob Krantz,et al. Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.
[26] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Yu Cheng,et al. Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models , 2020, ECCV.
[28] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[29] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.
[31] Silvio Savarese,et al. Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Arjun Majumdar,et al. Improving Vision-and-Language Navigation with Image-Text Pairs from the Web , 2020, ECCV.
[33] Robinson Piramuthu,et al. VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator , 2021, ArXiv.
[34] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[35] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.
[36] Chunhua Shen,et al. REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[38] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.
[39] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[40] Yizhou Yu,et al. Scene-Intuitive Agent for Remote Embodied Visual Grounding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[43] Ildoo Kim,et al. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision , 2021, ICML.
[44] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).
[45] Licheng Yu,et al. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.
[46] Ghassan Al-Regib,et al. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation , 2019, ICLR.
[47] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[48] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[50] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[51] Yuankai Qi,et al. Language and Visual Entity Relationship Graph for Agent Navigation , 2020, NeurIPS.
[52] Jesse Thomason,et al. Vision-and-Dialog Navigation , 2019, CoRL.
[53] Jianfeng Gao,et al. Robust Navigation with Language Pretraining and Stochastic Sampling , 2019, EMNLP.
[54] Chunhua Shen,et al. Soft Expert Reward Learning for Vision-and-Language Navigation , 2020, ECCV.
[55] Zornitsa Kozareva,et al. Environment-agnostic Multitask Learning for Natural Language Grounded Navigation , 2020, ECCV.
[56] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[57] Olga Russakovsky,et al. Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation , 2020, NeurIPS.
[58] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[59] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.