暂无分享,去创建一个
[1] Luke S. Zettlemoyer,et al. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.
[2] Sanja Fidler,et al. VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Licheng Yu,et al. Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout , 2019, NAACL.
[4] Craig A. Knoblock,et al. PDDL-the planning domain definition language , 1998 .
[5] Ghassan Al-Regib,et al. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation , 2019, ICLR.
[6] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[7] Ali Farhadi,et al. RoboTHOR: An Open Simulation-to-Real Embodied AI Platform , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.
[9] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[10] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.
[11] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[12] Dan Klein,et al. Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.
[13] Raymond J. Mooney,et al. Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.
[14] Ghassan Al-Regib,et al. The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Pierre Sermanet,et al. Grounding Language in Play , 2020, ArXiv.
[18] Stefanie Tellex,et al. Interpreting and Executing Recipes with a Cooking Robot , 2012, ISER.
[19] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Tao Yu,et al. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.
[21] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Stefanie Tellex,et al. Sequence-to-Sequence Language Grounding of Non-Markovian Task Specifications , 2018, Robotics: Science and Systems.
[23] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[24] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[25] Stefanie Tellex,et al. Grounding Language to Non-Markovian Tasks with No Supervision of Task Specifications , 2020, Robotics: Science and Systems.
[26] Benjamin Kuipers,et al. Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.
[27] Matthew R. Walter,et al. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.
[28] Fei Sha,et al. BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps , 2020, ACL.
[29] Jonghyun Choi,et al. MOCA: A Modular Object-Centric Approach for Interactive Instruction Following , 2020, ArXiv.
[30] Moritz Tenorth,et al. Understanding and executing instructions for everyday manipulation tasks from the World Wide Web , 2010, 2010 IEEE International Conference on Robotics and Automation.
[31] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[32] Luke S. Zettlemoyer,et al. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.
[33] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[34] Xiaojun Chang,et al. Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Siddhartha S. Srinivasa,et al. Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Guido Bugmann,et al. Using verbal instructions for route learning: Instruction Analysis , 2001 .
[37] Yuankai Qi,et al. A Recurrent Vision-and-Language BERT for Navigation , 2020, ArXiv.
[38] Zohar Manna,et al. The Temporal Logic of Reactive and Concurrent Systems , 1991, Springer New York.
[39] Luke S. Zettlemoyer,et al. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.
[40] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[41] Kevin Lee,et al. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Jason Baldridge,et al. Transferable Representation Learning in Vision-and-Language Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Sergey Levine,et al. Causal Confusion in Imitation Learning , 2019, NeurIPS.
[45] Felix Hill,et al. Object-based attention for spatio-temporal reasoning: Outperforming neuro-symbolic models with flexible distributed architectures , 2020, ArXiv.
[46] Jacob Krantz,et al. Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments , 2020, ECCV.
[47] Chen Sun,et al. Multi-modal Transformer for Video Retrieval , 2020, ECCV.
[48] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.
[49] Xin Wang,et al. Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.
[51] Nicholas Roy,et al. Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms , 2018, Int. J. Robotics Res..
[52] Roozbeh Mottaghi,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Matthew J. Hausknecht,et al. TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.
[54] John Langford,et al. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.
[55] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[56] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[57] Jitendra Malik,et al. Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[58] Ali Farhadi,et al. AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.
[59] Arjun Majumdar,et al. Improving Vision-and-Language Navigation with Image-Text Pairs from the Web , 2020, ECCV.
[60] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[61] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.
[62] Jason Baldridge,et al. Retouchdown: Adding Touchdown to StreetLearn as a Shareable Resource for Language Grounding Tasks in Street View , 2020, ArXiv.
[63] Roma Patel,et al. Learning to Ground Language to Temporal Logical Form , 2019 .
[64] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.
[65] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[66] Matthew J. Hausknecht,et al. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning , 2020, ICLR.
[67] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Razvan Pascanu,et al. Stabilizing Transformers for Reinforcement Learning , 2019, ICML.
[69] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[70] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).