论文信息 - Mapping Natural Language Instructions to Mobile UI Action Sequences - 字舞流文

Mapping Natural Language Instructions to Mobile UI Action Sequences

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PixelHelp, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in How-To instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PixelHelp.

Xin Zhou | Yuan Zhang | Jason Baldridge | Jiacong He | Yang Li | Xiaoxia Zhou | Jason Baldridge | Yuan Zhang | Yang Li | Jiacong He

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] John Langford,et al. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[3] Nitesh V. Chawla,et al. Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text , 2019, EMNLP/IJCNLP.

[4] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5] Dilek Z. Hakkani-Tür,et al. Learning to Navigate the Web , 2018, ICLR.

[6] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Matthew R. Walter,et al. Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[8] Zhanna Sarsenbayeva,et al. Situational Impairments during Mobile Interaction , 2018, UbiComp/ISWC Adjunct.

[9] Kallirroi Georgila,et al. Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing , 2018, LREC.

[10] Yoav Artzi,et al. Learning to Map Context-Dependent Sentences to Executable Formal Queries , 2018, NAACL.

[11] Luke S. Zettlemoyer,et al. Reading between the Lines: Learning to Map High-Level Instructions to Commands , 2010, ACL.

[12] Dustin Tran,et al. Image Transformer , 2018, ICML.

[13] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[15] Mathias Niepert,et al. Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[16] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Omer Levy,et al. Jointly Predicting Predicates and Arguments in Neural Semantic Role Labeling , 2018, ACL.

[19] Yang Li,et al. Area Attention , 2018, ICML.

[20] Percy Liang,et al. Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration , 2018, ICLR.

[21] Kenton Lee,et al. Learning Recurrent Span Representations for Extractive Question Answering , 2016, ArXiv.

[22] Luke S. Zettlemoyer,et al. End-to-end Neural Coreference Resolution , 2017, EMNLP.

[23] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[24] Huda Khayrallah,et al. Natural Language For Human Robot Interaction , 2015 .

[25] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[27] Jeffrey Nichols,et al. Rico: A Mobile App Dataset for Building Data-Driven Design Applications , 2017, UIST.