The Dialog State Tracking Challenge

In a spoken dialog system, dialog state tracking deduces information about the user’s goal as the dialog progresses, synthesizing evidence such as dialog acts over multiple turns with external data sources. Recent approaches have been shown to overcome ASR and SLU errors in some applications. However, there are currently no common testbeds or evaluation measures for this task, hampering progress. The dialog state tracking challenge seeks to address this by providing a heterogeneous corpus of 15K human-computer dialogs in a standard format, along with a suite of 11 evaluation metrics. The challenge received a total of 27 entries from 9 research groups. The results show that the suite of performance metrics cluster into 4 natural groups. Moreover, the dialog systems that benefit most from dialog state tracking are those with less discriminative speech recognition confidence scores. Finally, generalization is a key problem: in 2 of the 4 test sets, fewer than half of the entries out-performed simple baselines. 1 Overview and motivation Spoken dialog systems interact with users via natural language to help them achieve a goal. As the interaction progresses, the dialog manager maintains a representation of the state of the dialog in a process called dialog state tracking (DST). For example, in a bus schedule information system, the dialog state might indicate the user’s desired bus route, origin, and destination. Dialog state tracking is difficult because automatic speech ∗Most of the work for the challenge was performed when the second and third authors were with Honda Research Institute, Mountain View, CA, USA recognition (ASR) and spoken language understanding (SLU) errors are common, and can cause the system to misunderstand the user’s needs. At the same time, state tracking is crucial because the system relies on the estimated dialog state to choose actions – for example, which bus schedule information to present to the user. Most commercial systems use hand-crafted heuristics for state tracking, selecting the SLU result with the highest confidence score, and discarding alternatives. In contrast, statistical approaches compute scores for many hypotheses for the dialog state (Figure 1). By exploiting correlations between turns and information from external data sources – such as maps, bus timetables, or models of past dialogs – statistical approaches can overcome some SLU errors. Numerous techniques for dialog state tracking have been proposed, including heuristic scores (Higashinaka et al., 2003), Bayesian networks (Paek and Horvitz, 2000; Williams and Young, 2007), kernel density estimators (Ma et al., 2012), and discriminative models (Bohus and Rudnicky, 2006). Techniques have been fielded which scale to realistically sized dialog problems and operate in real time (Young et al., 2010; Thomson and Young, 2010; Williams, 2010; Mehta et al., 2010). In end-to-end dialog systems, dialog state tracking has been shown to improve overall system performance (Young et al., 2010; Thomson and Young, 2010). Despite this progress, direct comparisons between methods have not been possible because past studies use different domains and system components, for speech recognition, spoken language understanding, dialog control, etc. Moreover, there is little agreement on how to evaluate dialog state tracking. Together these issues limit progress in this research area. The Dialog State Tracking Challenge (DSTC) provides a first common testbed and evaluation

[1]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[2]  Mikio Nakano,et al.  Corpus-Based Discourse Understanding in Spoken Dialogue Systems , 2003, ACL.

[3]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010 , 2010, 2010 IEEE Spoken Language Technology Workshop.

[4]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Alexander I. Rudnicky,et al.  A “K Hypotheses + Other” Belief Updating Model , 2006 .

[6]  Rakesh Gupta,et al.  Probabilistic Ontology Trees for Belief Tracking in Dialog Systems , 2010, SIGDIAL Conference.

[7]  Antoine Raux,et al.  Dialog State Tracking Challenge Handbook , 2012 .

[8]  Rakesh Gupta,et al.  Landmark-Based Location Belief Tracking in a Spoken Dialog System , 2012, SIGDIAL Conference.

[9]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[10]  Eric Horvitz,et al.  Conversation as Action Under Uncertainty , 2000, UAI.

[11]  Steve J. Young,et al.  Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems , 2010, Comput. Speech Lang..

[12]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[13]  Maxine Eskénazi,et al.  Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data , 2010, 2010 IEEE Spoken Language Technology Workshop.

[14]  Jason D. Williams Incremental partition recombination for efficient tracking of multiple dialog states , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.