Prior Knowledge Driven Label Embedding for Slot Filling in Natural Language Understanding

Traditional slot filling in natural language understanding (NLU) predicts a one-hot vector for each word. This form of label representation lacks semantic correlation modeling, which leads to severe data sparsity problem, especially when adapting an NLU model to a new domain. To address this issue, a novel label embedding based slot filling framework is proposed in this article. Here, distributed label embedding is constructed for each slot using prior knowledge. Three encoding methods are investigated to incorporate different kinds of prior knowledge about slots: atomic concepts, slot descriptions, and slot exemplars. The proposed label embeddings tend to share text patterns and reuses data with different slot labels. This makes it useful for adaptive NLU with limited data. Also, since label embedding is independent of NLU model, it is compatible with almost all deep learning based slot filling models. The proposed approaches are evaluated on three datasets. Experiments on single domain and domain adaptation tasks show that label embedding achieves significant performance improvement over traditional one-hot label representation as well as advanced zero-shot approaches.

[1]  Kai Yu,et al.  Concept Transfer Learning for Adaptive Language Understanding , 2018, SIGDIAL Conference.

[2]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[3]  Kai Yu,et al.  Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Sungjin Lee,et al.  Zero-Shot Adaptive Transfer for Conversational Language Understanding , 2018, AAAI.

[5]  Gökhan Tür,et al.  A New Pre-Training Method for Training Deep Learning Models with Application to Spoken Language Understanding , 2016, INTERSPEECH.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Ngoc Thang Vu Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding , 2016, INTERSPEECH.

[8]  Dilek Z. Hakkani-Tür,et al.  Robust Zero-Shot Cross-Domain Slot Filling with Example Values , 2019, ACL.

[9]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[10]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[11]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Rahul Jha,et al.  Bag of Experts Architectures for Model Reuse in Conversational Language Understanding , 2018, NAACL-HLT.

[16]  Erik Cambria,et al.  Label Embedding for Zero-shot Fine-grained Named Entity Typing , 2016, COLING.

[17]  Kai Yu,et al.  AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Iryna Gurevych,et al.  Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks , 2017, ArXiv.

[19]  Young-Bum Kim,et al.  Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[20]  Young-Bum Kim,et al.  Domain Attention with an Ensemble of Experts , 2017, ACL.

[21]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[22]  Liang Li,et al.  A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding , 2018, EMNLP.

[23]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Gökhan Tür,et al.  Towards Zero-Shot Frame Semantic Parsing for Domain Scaling , 2017, INTERSPEECH.

[26]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Kai Yu,et al.  Data Augmentation with Atomic Templates for Spoken Language Understanding , 2019, EMNLP.

[28]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[29]  Spyridon Matsoukas,et al.  Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents , 2018, NAACL-HLT.

[30]  Kai Yu,et al.  Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Kai Yu,et al.  Joint Spoken Language Understanding and Domain Adaptive Language Modeling , 2018, IScIDE.

[32]  Rafael E. Banchs,et al.  Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[33]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[34]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35]  Alex Acero,et al.  Semantic Frame‐Based Spoken Language Understanding , 2011 .

[36]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[37]  Luke S. Zettlemoyer,et al.  Online Learning of Relaxed CCG Grammars for Parsing to Logical Form , 2007, EMNLP.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[43]  Larry P. Heck,et al.  Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding , 2016, INTERSPEECH.

[44]  Alex Acero,et al.  Spoken Language Understanding "” An Introduction to the Statistical Framework , 2005 .

[45]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[46]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[47]  Young-Bum Kim,et al.  New Transfer Learning Techniques for Disparate Label Sets , 2015, ACL.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[50]  Kai Yu,et al.  Semi-Supervised Training Using Adversarial Multi-Task Learning for Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[52]  Dilek Z. Hakkani-Tür,et al.  Zero-shot learning of intent embeddings for expansion by convolutional deep structured semantic models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Jiwei Li,et al.  Is Word Segmentation Necessary for Deep Learning of Chinese Representations? , 2019, ACL.

[54]  Bing Liu,et al.  Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding , 2017, ArXiv.

[55]  James Henderson,et al.  A Model of Zero-Shot Learning of Spoken Language Understanding , 2015, EMNLP.

[56]  Lu Chen,et al.  Structured Dialogue Policy with Graph Neural Networks , 2018, COLING.

[57]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[58]  Fabrice Lefèvre,et al.  Zero-shot semantic parser for spoken language understanding , 2015, INTERSPEECH.

[59]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[60]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[61]  Ngoc Thang Vu,et al.  Bi-directional recurrent neural network with ranking loss for spoken language understanding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.