Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
暂无分享,去创建一个
[1] Michel Denis,et al. When and Why Are Visual Landmarks Used in Giving Directions? , 2001, COSIT.
[2] Samarth Brahmbhatt,et al. DeepNav: Learning to Navigate Large Cities , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.
[4] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Daniel Jurafsky,et al. Learning to Follow Navigational Directions , 2010, ACL.
[6] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[7] Xin Wang,et al. Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation , 2018, ECCV.
[8] Stephan Winter,et al. Structural Salience of Landmarks for Route Directions , 2005, COSIT.
[9] Jitendra Malik,et al. Visual Memory for Robust Path Following , 2018, NeurIPS.
[10] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[11] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Khanh Nguyen,et al. Vision-Based Navigation With Language-Based Assistance via Imitation Learning With Indirect Intervention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Ali Farhadi,et al. Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] John F. Canny,et al. Grounding Human-To-Vehicle Advice for Self-Driving Vehicles , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[16] Silvio Savarese,et al. Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation , 2018, EMNLP.
[17] Ilya Kostrikov,et al. PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.
[18] Luc Van Gool,et al. Navigation using special buildings as signposts , 2014, MapInteract '14.
[19] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.
[20] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Trevor Darrell,et al. Language-Conditioned Graph Networks for Relational Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[23] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] L. Gool,et al. Talk2Car: Taking Control of Your Self-Driving Car , 2019, EMNLP.
[26] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[27] Emily M. Bender,et al. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[29] Jason Weston,et al. Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.
[30] Luc Van Gool,et al. Object Referring in Videos with Language and Human Gaze , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Raia Hadsell,et al. Learning To Follow Directions in Street View , 2019, AAAI.
[32] Luc Van Gool,et al. End-to-End Learning of Driving Models with Surround-View Cameras and Route Planners , 2018, ECCV.
[33] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[34] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[35] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[36] Luc Van Gool,et al. Learning Accurate, Comfortable and Human-like Driving , 2019, ArXiv.
[37] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.
[38] Paul U. Lee,et al. Pictorial and Verbal Tools for Conveying Routes , 1999, COSIT.
[39] M. Denis,et al. Language and spatial cognition: comparing the roles of landmarks and street names in route instructions , 2004 .
[40] Alexander G. Schwing,et al. Convolutional Image Captioning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Yale Song,et al. Video2GIF: Automatic Generation of Animated GIFs from Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[44] Ghassan Al-Regib,et al. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation , 2019, ICLR.
[45] Yuan-Fang Wang,et al. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[47] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[48] Paul U. Lee,et al. Wayfinding choremes - a language for modeling conceptual route knowledge , 2005, J. Vis. Lang. Comput..
[49] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[50] Toru Ishikawa,et al. Landmark Selection in the Environment: Relationships with Object Characteristics and Sense of Direction , 2012, Spatial Cogn. Comput..
[51] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[52] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[53] T. Tenbrink,et al. Would you follow your own route description? Cognitive strategies in urban route planning , 2011, Cognition.
[54] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[55] Stefan Lee,et al. Embodied Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[56] Luc Van Gool,et al. Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[57] Raia Hadsell,et al. Learning to Navigate in Cities Without a Map , 2018, NeurIPS.
[58] Jean Oh,et al. Grounding spatial relations for outdoor robot navigation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[59] Michel Denis,et al. Referring to Landmark or Street Information in Route Directions: What Difference Does It Make? , 2003, COSIT.
[60] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[61] Luc Van Gool,et al. Mapping, Localization and Path Planning for Image-Based Navigation Using Visual Features and Map , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[63] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[64] Byoungkwon An,et al. Looking Beyond the Visible Scene , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[65] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[66] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[67] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[68] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.
[69] Siddhartha S. Srinivasa,et al. Tactical Rewind: Self-Correction via Backtracking in Vision-And-Language Navigation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Andreas Geiger,et al. SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images , 2018, ECCV.
[71] Alexandra Millonig,et al. Developing Landmark-Based Pedestrian-Navigation Systems , 2007, IEEE Transactions on Intelligent Transportation Systems.
[72] Ghassan Al-Regib,et al. The Regretful Agent: Heuristic-Aided Navigation Through Progress Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[74] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[75] Maneesh Agrawala,et al. Automatic generation of tourist maps , 2008, ACM Trans. Graph..
[76] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.
[77] Trevor Darrell,et al. Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.
[78] Stephen Clark,et al. Understanding Grounded Language Learning Agents , 2017, ArXiv.
[79] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[80] Gregory Shakhnarovich,et al. Discriminability Objective for Training Descriptive Captions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[81] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.
[82] Lixiang Li,et al. Captioning Transformer with Stacked Attention Modules , 2018 .