Sequential Dialogue Context Modeling for Spoken Language Understanding

Spoken Language Understanding (SLU) is a key component of goal oriented dialogue systems that would parse user utterances into semantic frame representations. Traditionally SLU does not utilize the dialogue history beyond the previous system turn and contextual ambiguities are resolved by the downstream components. In this paper, we explore novel approaches for modeling dialogue context in a recurrent neural network (RNN) based language understanding system. We propose the Sequential Dialogue Encoder Network, that allows encoding context from the dialogue history in chronological order. We compare the performance of our proposed architecture with two context models, one that uses just the previous turn context and another that encodes dialogue context in a memory network, but loses the order of utterances in the dialogue history. Experiments with a multi-domain dialogue dataset demonstrate that the proposed architecture results in reduced semantic frame error rates.

[1]  Fei Liu,et al.  Dialog state tracking, a machine reading approach using Memory Network , 2016, EACL.

[2]  Alex Acero,et al.  Spoken Language Understanding "” An Introduction to the Statistical Framework , 2005 .

[3]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[4]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[5]  Dilek Z. Hakkani-Tür,et al.  Interactive reinforcement learning for task-oriented dialogue management , 2016 .

[6]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[7]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[8]  David Vandyke,et al.  Multi-domain Neural Network Language Generation for Spoken Dialogue Systems , 2016, NAACL.

[9]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[10]  David Vandyke,et al.  Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[11]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Barbara J. Grosz,et al.  Focusing and Description in Natural Language Dialogues , 1979 .

[14]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[15]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[16]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[17]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[18]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[19]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[20]  Matthew Henderson,et al.  Machine Learning for Dialog State Tracking: A Review , 2015 .

[21]  Joelle Pineau,et al.  Hierarchical Neural Network Generative Models for Movie Dialogues , 2015, ArXiv.

[22]  Gökhan Tür,et al.  Zero-Shot Learning and Clustering for Semantic Utterance Classification , 2013, ICLR.

[23]  Steve J. Young,et al.  Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[24]  David Vandyke,et al.  Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[25]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[26]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Gökhan Tür,et al.  Towards Zero-Shot Frame Semantic Parsing for Domain Scaling , 2017, INTERSPEECH.

[28]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[29]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.