Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding
暂无分享,去创建一个
Jason Baldridge | Alexander Ku | Eugene Ie | Peter Anderson | Roma Patel | Alexander Ku | Eugene Ie | Jason Baldridge | Peter Anderson | Roma Patel
[1] Orhan Firat,et al. Massively Multilingual Neural Machine Translation , 2019, NAACL.
[2] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[3] Jason Baldridge,et al. Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View , 2020, ArXiv.
[4] Timnit Gebru,et al. Datasheets for datasets , 2018, Commun. ACM.
[5] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[6] Gabriel Magalhaes,et al. Effective and General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping , 2019, 1907.05446.
[7] Ashish Vaswani,et al. Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation , 2019, ACL.
[8] Jordi Pont-Tuset,et al. Connecting Vision and Language with Localized Narratives , 2019, ECCV.
[9] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).
[10] Jason Baldridge,et al. Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO , 2020, EACL.
[11] Jason Baldridge,et al. Transferable Representation Learning in Vision-and-Language Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Barbara Landau,et al. Spatial language and spatial representation: a cross-linguistic comparison , 2001, Cognition.
[13] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[14] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Andrew Bennett,et al. Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.
[16] Licheng Yu,et al. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.
[17] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[20] Fei Sha,et al. BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps , 2020, ACL.
[21] Chunhua Shen,et al. REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Jacob Krantz,et al. Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.
[23] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Harsh Mehta,et al. VALAN: Vision and Language Agent Navigation , 2019, ArXiv.
[25] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.
[26] Jianfeng Gao,et al. Robust Navigation with Language Pretraining and Stochastic Sampling , 2019, EMNLP.
[27] Zornitsa Kozareva,et al. Environment-agnostic Multitask Learning for Natural Language Grounded Navigation , 2020, ECCV.
[28] Christian J. Rapold,et al. Plasticity of human spatial cognition: Spatial language and cognition covary across cultures , 2011, Cognition.
[29] Emily M. Bender. Linguistically Naïve != Language Independent: Why NLP Needs Linguistic Typology , 2009 .
[30] Jesse Thomason,et al. Vision-and-Dialog Navigation , 2019, CoRL.
[31] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[32] Raymond J. Mooney,et al. Self-Critical Reasoning for Robust Visual Question Answering , 2019, NeurIPS.
[33] Andrea Bender,et al. Mapping spatial frames of reference onto time: A review of theoretical accounts and empirical findings , 2014, Cognition.
[34] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[37] Gabriel Synnaeve,et al. Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters , 2020, INTERSPEECH.
[38] Yonatan Bisk,et al. Shifting the Baseline: Single Modality Performance on Visual Navigation & QA , 2018, NAACL.
[39] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[40] Hongxia Jin,et al. Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[42] Stephen Gould,et al. Sub-Instruction Aware Vision-and-Language Navigation , 2020, EMNLP.