Efficient Large-Scale Domain Classification with Personalized Attention

In this paper, we explore the task of mapping spoken language utterances to one of thousands of natural language understanding domains in intelligent personal digital assistants (IPDAs). This scenario is observed for many mainstream IPDAs in industry that allow third parties to develop thousands of new domains to augment built-in ones to rapidly increase domain coverage and overall IPDA capabilities. We propose a scalable neural model architecture with a shared encoder, a novel attention mechanism that incorporates personalization information and domain-specific classifiers that solves the problem efficiently. Our architecture is designed to efficiently accommodate new domains that appear in-between full model retraining cycles with a rapid bootstrapping mechanism two orders of magnitude faster than retraining. We account for practical constraints in real-time production systems, and design to minimize memory footprint and runtime latency. We demonstrate that incorporating personalization results in significantly more accurate domain classification in the setting with thousands of overlapping domains.

[1]  John R. Hershey,et al.  Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs , 2016, INTERSPEECH.

[2]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.

[3]  Björn Hoffmeister,et al.  Zero-Shot Learning Across Heterogeneous Overlapping Domains , 2017, INTERSPEECH.

[4]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Gökhan Tür,et al.  Towards Zero-Shot Frame Semantic Parsing for Domain Scaling , 2017, INTERSPEECH.

[7]  Xing Fan,et al.  Transfer Learning for Neural Semantic Parsing , 2017, Rep4NLP@ACL.

[8]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[9]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[10]  Dilek Z. Hakkani-Tür,et al.  Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Young-Bum Kim,et al.  Pre-training of Hidden-Unit CRFs , 2015, ACL.

[12]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[13]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[14]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[15]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[16]  Björn Hoffmeister,et al.  Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding , 2017, ArXiv.

[17]  Larry P. Heck,et al.  Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding , 2016, INTERSPEECH.

[18]  Bing Liu,et al.  Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding , 2017, ArXiv.

[19]  Ruhi Sarikaya An overview of the system architecture and key components The Technology Behind Personal Digital Assistants , 2022 .

[20]  Gökhan Tür Multitask Learning for Spoken Language Understanding , 2006, ICASSP.

[21]  Young-Bum Kim,et al.  A Framework for pre-training hidden-unit conditional random fields and its extension to long short term memory networks , 2017, Comput. Speech Lang..

[22]  Gökhan Tür,et al.  A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding , 2016, INTERSPEECH.

[23]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[24]  Kyunghyun Cho,et al.  Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers , 2016, ArXiv.

[25]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[26]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[27]  Ruhi Sarikaya,et al.  Contextual domain classification in spoken language understanding systems using recurrent neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Dilek Z. Hakkani-Tür,et al.  Easy contextual intent prediction and slot detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Young-Bum Kim,et al.  An overview of end-to-end language understanding and dialog management for personal digital assistants , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[31]  Sungjin Lee,et al.  Speaker-sensitive dual memory networks for multi-turn slot tagging , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[32]  Young-Bum Kim,et al.  Weakly Supervised Slot Tagging with Partially Labeled Sequences from Web Search Click Logs , 2015, NAACL.

[33]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Young-Bum Kim,et al.  Domain Attention with an Ensemble of Experts , 2017, ACL.

[36]  Gökhan Tür,et al.  Extending domain coverage of language understanding systems via intent transfer between domains using knowledge graphs and search query click logs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[38]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[39]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[40]  Young-Bum Kim,et al.  Adversarial Adaptation of Synthetic or Stale Data , 2017, ACL.

[41]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[42]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[43]  Sungjin Lee,et al.  ONENET: Joint domain, intent, slot prediction for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).