How Would You Say It? Eliciting Lexically Diverse Dialogue for Supervised Semantic Parsing

Building dialogue interfaces for real-world scenarios often entails training semantic parsers starting from zero examples. How can we build datasets that better capture the variety of ways users might phrase their queries, and what queries are actually realistic? Wang et al. (2015) proposed a method to build semantic parsing datasets by generating canonical utterances using a grammar and having crowdworkers paraphrase them into natural wording. A limitation of this approach is that it induces bias towards using similar language as the canonical utterances. In this work, we present a methodology that elicits meaningful and lexically diverse queries from users for semantic parsing tasks. Starting from a seed lexicon and a generative grammar, we pair logical forms with mixed text-image representations and ask crowdworkers to paraphrase and confirm the plausibility of the queries that they generated. We use this method to build a semantic parsing dataset from scratch for a dialog agent in a smart-home simulation. We find evidence that this dataset, which we have named SmartHome, is demonstrably more lexically diverse and difficult to parse than existing domain-specific semantic parsing datasets.

[1]  Oliver Lemon,et al.  Crowd-sourcing NLG Data: Pictures Elicit Better Data. , 2016, INLG.

[2]  Raymond J. Mooney,et al.  Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes , 2015, ACL.

[3]  Sumit Gulwani,et al.  NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation , 2014, SIGMOD Conference.

[4]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[5]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[6]  Raymond J. Mooney,et al.  Integrated Learning of Dialog Strategies and Semantic Parsing , 2017, EACL.

[7]  Di Wang,et al.  CMU OAQA at TREC 2015 LiveQA: Discovering the Right Answer with Clues , 2015, TREC.

[8]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[9]  Regina Barzilay,et al.  Using Semantic Unification to Generate Regular Expressions from Natural Language , 2013, NAACL.

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Bowen Zhou,et al.  LSTM-based Deep Learning Models for non-factoid answer selection , 2015, ArXiv.

[12]  Ralf Engel SPIN : A Semantic Parser for Spoken Dialog Systems , 2006 .

[13]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[14]  Amos Azaria,et al.  Instructable Intelligent Personal Agent , 2016, AAAI.

[15]  Lu Chen,et al.  Semantic parser enhancement for dialogue domain extension with little data , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[16]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[17]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[18]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[19]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[20]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[21]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[22]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[23]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.