A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility

[1]  Mostafa Dehghani,et al.  VUT: Versatile UI Transformer for Multi-Modal Multi-Task User Interface Modeling , 2021, ArXiv.

[2]  Devendra Singh Chaplot,et al.  FILM: Following Instructions in Language with Modular Methods , 2021, ICLR.

[3]  Kota Yamaguchi,et al.  CanvasVAE: Learning to Generate Vector Graphic Documents , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Dieter Fox,et al.  A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution , 2021, CoRL.

[5]  Bhargava Urala Kota,et al.  DocFormer: End-to-End Transformer for Document Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Hongfu Liu,et al.  SelfDoc: Self-Supervised Document Representation Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Chih-Yao Ma,et al.  Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[9]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[10]  Toby Jia-Jun Li,et al.  Screen2Vec: Semantic Embedding of GUI Screens and GUI Components , 2021, CHI.

[11]  Kunal Pratap Singh,et al.  Factorizing Perception and Policy for Interactive Instruction Following , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[13]  Toby Jia-Jun Li,et al.  Demonstration + Natural Language: Multimodal Interfaces for GUI-Based Interactive Task Learning Agents , 2021, Human–Computer Interaction Series.

[14]  Ranjay Krishna,et al.  Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning , 2020, W-NUT@EMNLP.

[15]  Tom M. Mitchell,et al.  Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs , 2020, UIST.

[16]  Jason Baldridge,et al.  Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.

[17]  Xin Zhou,et al.  Mapping Natural Language Instructions to Mobile UI Action Sequences , 2020, ACL.

[18]  Luke Zettlemoyer,et al.  ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaojun Chang,et al.  Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Apu Kapadia,et al.  "I am uncomfortable sharing what I can't see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications , 2020, USENIX Security Symposium.

[21]  Hal Daumé,et al.  Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning , 2019, EMNLP.

[22]  Ryen W. White,et al.  Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search , 2019, WWW.

[23]  Philip H. S. Torr,et al.  Visual Dialogue without Vision or Dialogue , 2018, ArXiv.

[24]  Thomas F. Liu,et al.  Learning Design Semantics for Mobile Apps , 2018, UIST.

[25]  Percy Liang,et al.  Mapping natural language commands to web elements , 2018, EMNLP.

[26]  Sanja Fidler,et al.  VirtualHome: Simulating Household Activities Via Programs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jiebo Luo,et al.  VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[32]  Jeffrey Nichols,et al.  Rico: A Mobile App Dataset for Building Data-Driven Design Applications , 2017, UIST.

[33]  Percy Liang,et al.  World of Bits: An Open-Domain Platform for Web-Based Agents , 2017, ICML.

[34]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Amos Azaria,et al.  SUGILITE: Creating Multimodal Smartphone Automation by Demonstration , 2017, CHI.

[38]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[39]  Ranjitha Kumar,et al.  ERICA: Interaction Mining Mobile Apps , 2016, UIST.

[40]  Gordon Christie,et al.  Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions , 2016, EMNLP.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  David J. Crandall,et al.  Privacy Concerns and Behaviors of People with Visual Impairments , 2015, CHI.

[43]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[44]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.