论文信息 - Mapping natural language commands to web elements

Mapping natural language commands to web elements

The web provides a rich, open-domain environment with textual, structural, and spatial properties. We propose a new task for grounding language in this environment: given a natural language command (e.g., “click on the second article”), choose the correct element on the web page (e.g., a hyperlink or text box). We collected a dataset of over 50,000 commands that capture various phenomena such as functional references (e.g. “find who made this site”), relational reasoning (e.g. “article by john”), and visual reasoning (e.g. “top-most article”). We also implemented and analyzed three baseline models that capture different phenomena present in the dataset.

[1] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[2] Michael Gamon,et al. Building Natural Language Interfaces to Web APIs , 2017, CIKM.

[3] Dilek Z. Hakkani-Tür,et al. Resolving Referring Expressions in Conversational Dialogs for Natural User Interfaces , 2014, EMNLP.

[4] Percy Liang,et al. Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[5] Percy Liang,et al. Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration , 2018, ICLR.

[6] Dan Klein,et al. A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[7] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[8] Dan Klein,et al. Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[9] Dan Klein,et al. Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[10] Michael C. Frank,et al. Predicting Pragmatic Reasoning in Language Games , 2012, Science.

[11] Andreas Paepcke,et al. EyePoint: practical pointing and selection using gaze and keyboard , 2007, CHI.

[12] Licheng Yu,et al. A Joint Speaker-Listener-Reinforcer Model for Referring Expressions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Ashutosh Saxena,et al. Environment-Driven Lexicon Induction for High-Level Instructions , 2015, ACL.

[14] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[15] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[16] Rob Miller,et al. Sikuli: using GUI screenshots for search and automation , 2009, UIST '09.

[17] Raymond J. Mooney,et al. Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[18] Dan Klein,et al. Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[19] Percy Liang,et al. World of Bits: An Open-Domain Platform for Web-Based Agents , 2017, ICML.

[20] Bowen Zhou,et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[21] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[22] David Schlangen,et al. Obtaining referential word meanings from visual and distributional information: Experiments on object naming , 2017, ACL.

[23] Ming-Wei Chang,et al. Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[24] Mary Zajicek,et al. A Web navigation tool for the blind , 1998, Assets '98.

[25] Kin Fun Li,et al. Keysurf: a character controlled browser for people with physical disabilities , 2008, WWW.

[26] Jason Weston,et al. Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[27] Luke S. Zettlemoyer,et al. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[28] Percy Liang,et al. From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.

[29] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[30] Milica Gasic,et al. POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32] Richard Socher,et al. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[33] Luke S. Zettlemoyer,et al. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[34] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[35] Regina Barzilay,et al. Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[36] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[37] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.

[38] Gregg Rothermel,et al. Why do Record/Replay Tests of Web Applications Break? , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[39] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[40] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Michael C. Frank,et al. Learning and using language via recursive pragmatic reasoning about other agents , 2013, NIPS.

[42] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[43] I. V. Ramakrishnan,et al. Wizard-of-Oz evaluation of speech-driven web browsing interface for people with vision impairments , 2014, W4A.

[44] Ravi Kuber,et al. A novel multimodal interface for improving visually impaired people’s web accessibility , 2005, Virtual Reality.