Recursive Template-based Frame Generation for Task Oriented Dialog

The Natural Language Understanding (NLU) component in task oriented dialog systems processes a user’s request and converts it into structured information that can be consumed by downstream components such as the Dialog State Tracker (DST). This information is typically represented as a semantic frame that captures the intent and slot-labels provided by the user. We first show that such a shallow representation is insufficient for complex dialog scenarios, because it does not capture the recursive nature inherent in many domains. We propose a recursive, hierarchical frame-based representation and show how to learn it from data. We formulate the frame generation task as a template-based tree decoding task, where the decoder recursively generates a template and then fills slot values into the template. We extend local tree-based loss functions with terms that provide global supervision and show how to optimize them end-to-end. We achieve a small improvement on the widely used ATIS dataset and a much larger improvement on a more complex dataset we describe here.

[1]  Charles Elkan,et al.  What we need to learn if we want to do and not just talk , 2018, NAACL-HLT.

[2]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3]  Rashmi Gangadharaiah,et al.  Joint Multiple Intent Detection and Slot Labeling for Goal-Oriented Dialog , 2019, NAACL.

[4]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[5]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[6]  Ruhi Sarikaya,et al.  Exploiting shared information for multi-intent natural language sentence classification , 2013, INTERSPEECH.

[7]  Yoshua Bengio,et al.  Straight to the Tree: Constituency Parsing with Neural Syntactic Distance , 2018, ACL.

[8]  Sonal Gupta,et al.  Semantic Parsing for Task Oriented Dialog using Hierarchical Representations , 2018, EMNLP.

[9]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[10]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[11]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.

[12]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Wen Wang,et al.  BERT for Joint Intent Classification and Slot Filling , 2019, ArXiv.

[16]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[17]  Gary Geunbae Lee,et al.  Two-stage multi-intent detection for spoken language understanding , 2017, Multimedia Tools and Applications.

[18]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.