Bag of Experts Architectures for Model Reuse in Conversational Language Understanding

Slot tagging, the task of detecting entities in input user utterances, is a key component of natural language understanding systems for personal digital assistants. Since each new domain requires a different set of slots, the annotation costs for labeling data for training slot tagging models increases rapidly as the number of domains grow. To tackle this, we describe Bag of Experts (BoE) architectures for model reuse for both LSTM and CRF based models. Extensive experimentation over a dataset of 10 domains drawn from data relevant to our commercial personal digital assistant shows that our BoE models outperform the baseline models with a statistically significant average margin of 5.06% in absolute F1-score when training with 2000 instances per domain, and achieve an even higher improvement of 12.16% when only 25% of the training data is used.

[1]  Björn Hoffmeister,et al.  Zero-Shot Learning Across Heterogeneous Overlapping Domains , 2017, INTERSPEECH.

[2]  Gökhan Tür,et al.  Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing , 2012, INTERSPEECH.

[3]  Dilek Z. Hakkani-Tür,et al.  Exploiting the Semantic Web for unsupervised spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[4]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[5]  Young-Bum Kim,et al.  Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs , 2015, NAACL.

[6]  Gökhan Tür,et al.  Leveraging knowledge graphs for web-scale unsupervised semantic parsing , 2013, INTERSPEECH.

[7]  Larry P. Heck,et al.  Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding , 2016, INTERSPEECH.

[8]  Gökhan Tür,et al.  Towards Zero-Shot Frame Semantic Parsing for Domain Scaling , 2017, INTERSPEECH.

[9]  Asli Celikyilmaz,et al.  Convolutional Neural Network Based Semantic Tagging with Entity Embeddings , 2015 .

[10]  Young-Bum Kim,et al.  An overview of end-to-end language understanding and dialog management for personal digital assistants , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[11]  Young-Bum Kim,et al.  Domain Attention with an Ensemble of Experts , 2017, ACL.

[12]  Young-Bum Kim,et al.  Natural Language Model Re-usability for Scaling to Different Domains , 2016, EMNLP.

[13]  Gökhan Tür,et al.  Employing web search query click logs for multi-domain spoken language understanding , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Dilek Z. Hakkani-Tür,et al.  Enriching Word Embeddings Using Knowledge Graph for Semantic Tagging in Conversational Dialog Systems , 2015, AAAI Spring Symposia.

[15]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[16]  Gökhan Tür,et al.  Syntax or semantics? knowledge-guided joint semantic frame parsing , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Young-Bum Kim,et al.  Adversarial Adaptation of Synthetic or Stale Data , 2017, ACL.

[19]  Gökhan Tür,et al.  A weakly-supervised approach for discovering new user intents from search query logs , 2013, INTERSPEECH.

[20]  Ruhi Sarikaya,et al.  Deep contextual language understanding in spoken dialogue systems , 2015, INTERSPEECH.

[21]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[24]  Young-Bum Kim,et al.  Domainless Adaptation by Constrained Decoding on a Schema Lattice , 2016, COLING.

[25]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[26]  Gökhan Tür,et al.  Bootstrapping Domain Detection Using Query Click Logs for New Domains , 2011, INTERSPEECH.

[27]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[28]  Gökhan Tür,et al.  Zero-Shot Learning and Clustering for Semantic Utterance Classification , 2013, ICLR.

[29]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[30]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[31]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[32]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[33]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.