A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
暂无分享,去创建一个
[1] Mostafa Dehghani,et al. VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling , 2021, ArXiv.
[2] Devendra Singh Chaplot,et al. FILM: Following Instructions in Language with Modular Methods , 2021, ICLR.
[3] Kota Yamaguchi,et al. CanvasVAE: Learning to Generate Vector Graphic Documents , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Dieter Fox,et al. A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution , 2021, CoRL.
[5] Bhargava Urala Kota,et al. DocFormer: End-to-End Transformer for Document Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Hongfu Liu,et al. SelfDoc: Self-Supervised Document Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Chih-Yao Ma,et al. Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).
[8] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[9] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[10] Toby Jia-Jun Li,et al. Screen2Vec: Semantic Embedding of GUI Screens and GUI Components , 2021, CHI.
[11] Kunal Pratap Singh,et al. Factorizing Perception and Policy for Interactive Instruction Following , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[13] Toby Jia-Jun Li,et al. Demonstration + Natural Language: Multimodal Interfaces for GUI-Based Interactive Task Learning Agents , 2021, Human–Computer Interaction Series.
[14] Ranjay Krishna,et al. Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning , 2020, W-NUT@EMNLP.
[15] Tom M. Mitchell,et al. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs , 2020, UIST.
[16] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.
[17] Xin Zhou,et al. Mapping Natural Language Instructions to Mobile UI Action Sequences , 2020, ACL.
[18] Luke Zettlemoyer,et al. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Xiaojun Chang,et al. Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Apu Kapadia,et al. "I am uncomfortable sharing what I can't see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications , 2020, USENIX Security Symposium.
[21] Hal Daumé,et al. Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning , 2019, EMNLP.
[22] Ryen W. White,et al. Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search , 2019, WWW.
[23] Philip H. S. Torr,et al. Visual Dialogue without Vision or Dialogue , 2018, ArXiv.
[24] Thomas F. Liu,et al. Learning Design Semantics for Mobile Apps , 2018, UIST.
[25] Percy Liang,et al. Mapping natural language commands to web elements , 2018, EMNLP.
[26] Sanja Fidler,et al. VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Jiebo Luo,et al. VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Ali Farhadi,et al. IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[30] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.
[32] Jeffrey Nichols,et al. Rico: A Mobile App Dataset for Building Data-Driven Design Applications , 2017, UIST.
[33] Percy Liang,et al. World of Bits: An Open-Domain Platform for Web-Based Agents , 2017, ICML.
[34] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[36] Ali Farhadi,et al. Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[37] Amos Azaria,et al. SUGILITE: Creating Multimodal Smartphone Automation by Demonstration , 2017, CHI.
[38] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[39] Ranjitha Kumar,et al. ERICA: Interaction Mining Mobile Apps , 2016, UIST.
[40] Gordon Christie,et al. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions , 2016, EMNLP.
[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] David J. Crandall,et al. Privacy Concerns and Behaviors of People with Visual Impairments , 2015, CHI.
[43] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[44] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[45] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.