Dialog state tracking with attention-based sequence-to-sequence learning

We present an advanced dialog state tracking system designed for the 5th Dialog State Tracking Challenge (DSTC5). The main task of DSTC5 is to track the dialog state in a human-human dialog. For each utterance, the tracker emits a frame of slot-value pairs considering the full history of the dialog up to the current turn. Our system includes an encoder-decoder architecture with an attention mechanism to map an input word sequence to a set of semantic labels, i.e., slot-value pairs. This handles the problem of the unknown alignment between the utterances and the labels. By combining the attention-based tracker with rule-based trackers elaborated for English and Chinese, the F-score for the development set improved from 0.475 to 0.507 compared to the rule-only trackers. Moreover, we achieved 0.517 F-score by refining the combination strategy based on the topic and slot level performance of each tracker. In this paper, we also validate the efficacy of each technique and report the test set results submitted to the challenge.

[1]  John R. Hershey,et al.  Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs , 2016, INTERSPEECH.

[2]  Robert J. Gaizauskas,et al.  Event coreference for information extraction , 1997 .

[3]  Josef Steinberger,et al.  Coreference Applications to Summarization , 2016, Anaphora Resolution - Algorithms, Resources, and Applications.

[4]  Renato De Mori,et al.  Spoken language understanding: a survey , 2007, ASRU.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Rafael E. Banchs,et al.  The fifth dialog state tracking challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Franck Dernoncourt,et al.  Robust Dialog State Tracking for Large Ontologies , 2016, IWSDS.

[10]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[11]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[12]  David A. McAllester,et al.  Machine Comprehension with Syntax, Frames, and Semantics , 2015, ACL.

[13]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[14]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[15]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Takaaki Hori,et al.  Context Sensitive Spoken Language Understanding using Role Dependent LSTM layers , 2015 .

[17]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[20]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.