An overview of end-to-end language understanding and dialog management for personal digital assistants

Spoken language understanding and dialog management have emerged as key technologies in interacting with personal digital assistants (PDAs). The coverage, complexity, and the scale of PDAs are much larger than previous conversational understanding systems. As such, new problems arise. In this paper, we provide an overview of the language understanding and dialog management capabilities of PDAs, focusing particularly on Cortana, Microsoft's PDA. We explain the system architecture for language understanding and dialog management for our PDA, indicate how it differs with prior state-of-the-art systems, and describe key components. We also report a set of experiments detailing system performance on a variety of scenarios and tasks. We describe how the quality of user experiences are measured end-to-end and also discuss open issues.

[1]  Victor Zue,et al.  GALAXY: a human-language interface to on-line travel information , 1994, ICSLP.

[2]  Qiang Wu,et al.  Learning to Rank Using an Ensemble of Lambda-Gradient Models , 2010, Yahoo! Learning to Rank Challenge.

[3]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[4]  Ruhi Sarikaya,et al.  Hypotheses ranking and state tracking for a multi-domain dialog system using multiple ASR alternates , 2015, INTERSPEECH.

[5]  Young-Bum Kim,et al.  Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform , 2016, NAACL.

[6]  Ruhi Sarikaya,et al.  Hypotheses ranking for robust domain classification and tracking in dialogue systems , 2014, INTERSPEECH.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[9]  Ruhi Sarikaya The technology powering personal digital assistants , 2015, INTERSPEECH.

[10]  Imed Zitouni,et al.  Automatic Online Evaluation of Intelligent Assistants , 2015, WWW.

[11]  Ruhi Sarikaya,et al.  Deep contextual language understanding in spoken dialogue systems , 2015, INTERSPEECH.

[12]  David Traum,et al.  The Information State Approach to Dialogue Management , 2003 .

[13]  Hao Tian,et al.  Policy Learning for Domain Selection in an Extensible Multi-domain Spoken Dialogue System , 2014, EMNLP.

[14]  David R. Traum,et al.  A reranking approach for recognition and classification of speech input in conversational dialogue systems , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[15]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[16]  Gökhan Tür,et al.  Learning Weighted Entity Lists from Web Click Logs for Spoken Language Understanding , 2011, INTERSPEECH.

[17]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[18]  Alessandro Moschitti,et al.  Discriminative Reranking for Spoken Language Understanding , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Ruhi Sarikaya,et al.  Multi-language hypotheses ranking and domain tracking for open domain dialogue systems , 2015, INTERSPEECH.

[20]  Dilek Z. Hakkani-Tür,et al.  Resolving Referring Expressions in Conversational Dialogs for Natural User Interfaces , 2014, EMNLP.

[21]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[22]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[23]  Ruhi Sarikaya,et al.  Exploiting shared information for multi-intent natural language sentence classification , 2013, INTERSPEECH.

[24]  Ruhi Sarikaya,et al.  Contextual domain classification in spoken language understanding systems using recurrent neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Dilek Z. Hakkani-Tür,et al.  Easy contextual intent prediction and slot detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[27]  Gökhan Tür,et al.  The AT&T spoken language understanding system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.