Distributed open-domain conversational understanding framework with domain independent extractors

Traditional spoken dialog systems are usually based on a centralized architecture, in which the number of domains is predefined, and the provider is fixed for a given domain and intent. The spoken language understanding (SLU) component is responsible for detecting domain and intents, and filling domain-specific slots. It is expensive and time-consuming in this architecture to add new and/or competing domains, intents, or providers. The rapid growth of service providers in the mobile computing market calls for an extensible dialog system framework. This paper presents a distributed dialog infrastructure where each domain or provider is agnostic of others, and processes the user utterances independently using their own knowledge or models, so that a new domain and new provider can be easily incorporated in. In addition, to facilitate each service provider building their own SLU models or algorithms, we introduce a new component, extractors, to provide intermediate semantic annotations such as entity mention tags, which can be plugged in arbitrarily as well. Each service provider can then rapidly develop their SLU parser with minimum efforts by providing some example sentences with intents and slots if needed. Our preliminary experimental results demonstrate the power of this new framework compared to a centralized architecture.

[1]  Lin-Shan Lee,et al.  A Distributed Agent Architecture for Intelligent Mulit-Domain Spoken Dialogue Systems , 2001 .

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[4]  Chin-Hui Lee,et al.  Boosting and combination of classifiers for natural language call routing systems , 2003, Speech Commun..

[5]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[6]  Gökhan Tür,et al.  Use of kernel deep convex networks and end-to-end learning for spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[7]  Hiroshi G. Okuno,et al.  A Two-Stage Domain Selection Framework for Extensible Multi-Domain Spoken Dialogue Systems , 2011, SIGDIAL Conference.

[8]  Naoyuki Kanda,et al.  Multi-Domain Spoken Dialogue System with Extensibility and Robustness against Speech Recognition Errors , 2006, SIGDIAL Workshop.

[9]  Gökhan Tür,et al.  Zero-Shot Learning and Clustering for Semantic Utterance Classification , 2013, ICLR.

[10]  Gökhan Tür,et al.  A Discriminative Classification-Based Approach to Information State Updates for a Multi-Domain Dialog System , 2012, INTERSPEECH.

[11]  G. Tur,et al.  Model adaptation for spoken language understanding , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Gökhan Tür,et al.  Bootstrapping spoken dialogue systems by exploiting reusable libraries , 2008, Natural Language Engineering.

[13]  Alexander I. Rudnicky,et al.  Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Gökhan Tür,et al.  The AT&T spoken language understanding system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Gökhan Tür,et al.  Approximate Inference for Domain Detection in Spoken Language Understanding , 2011, INTERSPEECH.

[19]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[20]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[23]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[24]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[25]  Gokhan Tur,et al.  Multi-Domain Spoken Language Understanding with Approximate Inference , 2011 .

[26]  Alex Acero,et al.  Discriminative models for spoken language understanding , 2006, INTERSPEECH.

[27]  Gökhan Tür,et al.  Bootstrapping spoken dialogue systems by exploiting reusable libraries , 2008, Nat. Lang. Eng..

[28]  Gökhan Tür,et al.  Active learning for spoken language understanding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29]  Lin-Shan Lee,et al.  A DISTRIBUTED ARCHITECTURE FOR COOPERATIVE SPOKEN DIALOGUE AGENTS WITH COHERENT DIALOGUE STATE AND HISTORY , 2000 .

[30]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Gökhan Tür,et al.  Multi-Task Learning for Spoken Language Understanding with Shared Slots , 2011, INTERSPEECH.

[32]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[33]  Chin-Hui Lee,et al.  A speech understanding system based on statistical representation of semantics , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.