Talk2Car: Taking Control of Your Self-Driving Car
暂无分享,去创建一个
Marie-Francine Moens | Luc Van Gool | Thierry Deruyttere | Simon Vandenhende | Dusan Grujicic | L. Gool | Simon Vandenhende | Marie-Francine Moens | Thierry Deruyttere | Dusan Grujicic
[1] Stefanie Tellex,et al. Clarifying commands with information-theoretic human-robot dialog , 2013, HRI 2013.
[2] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[3] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[4] Louis-Philippe Morency,et al. Visual Referring Expression Recognition: What Do Systems Actually Learn? , 2018, NAACL.
[5] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Albert S. Huang,et al. Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands , 2017, ArXiv.
[7] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Justin Johnson,et al. DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer , 2018, ArXiv.
[9] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[12] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] Mohit Shridhar,et al. Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction , 2018, Robotics: Science and Systems.
[15] Mark Johnson,et al. An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.
[16] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[18] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[19] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Trevor Darrell,et al. Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.
[21] Demis Hassabis,et al. Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.
[22] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Licheng Yu,et al. MAttNet: Modular Attention Network for Referring Expression Comprehension , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..
[26] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[27] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[28] Luc Van Gool,et al. Object Referring in Videos with Language and Human Gaze , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[31] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[32] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.
[33] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Albert S. Huang,et al. Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Language , 2013 .
[35] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[36] Jason Weston,et al. Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.
[37] Dengxin Dai,et al. Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory , 2019, Int. J. Comput. Vis..
[38] Qiang Xu,et al. nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).