Meta Learning to Classify Intent and Slot Labels with Noisy Few Shot Examples

Recently deep learning has dominated many machine learning areas, including spoken language understanding (SLU). However, deep learning models are notorious for being data-hungry, and the heavily optimized models are usually sensitive to the quality of the training examples provided and the consistency between training and inference conditions. To improve the performance of SLU models on tasks with noisy and low training resources, we propose a new SLU benchmarking task: few-shot robust SLU, where SLU comprises two core problems, intent classification (IC) and slot labeling (SL). We establish the task by defining few-shot splits on three public IC/SL datasets, ATIS, SNIPS, and TOP, and adding two types of natural noises (adaptation example missing/replacing and modality mismatch) to the splits. We further propose a novel noise-robust few-shot SLU model based on prototypical networks. We show the model consistently outperforms the conventional fine-tuning baseline and another popular meta-learning method, Model-Agnostic Meta-Learning (MAML), in terms of achieving better IC accuracy and SL F1, and yielding smaller performance variation when noises are present.

[1]  Hung-yi Lee,et al.  Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Ariya Rastrow,et al.  LatticeRnn: Recurrent Neural Networks Over Lattices , 2016, INTERSPEECH.

[3]  2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , 2019 .

[4]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[5]  Ryuichiro Higashinaka,et al.  Neural Confnet Classification: Fully Neural Network Based Spoken Utterance Classification Using Word Confusion Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Learning Robust Dialog Policies in Noisy Environments , 2017, ArXiv.

[7]  Shang-Wen Li,et al.  Prototypical Q Networks for Automatic Conversational Diagnosis and Few-Shot New Disease Adaption , 2020, INTERSPEECH.

[8]  Gökhan Tür,et al.  Semantic parsing using word confusion networks with conditional random fields , 2013, INTERSPEECH.

[9]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[10]  Sonal Gupta,et al.  Semantic Parsing for Task Oriented Dialog using Hierarchical Representations , 2018, EMNLP.

[11]  Gökhan Tür,et al.  Improving spoken language understanding using word confusion networks , 2002, INTERSPEECH.

[12]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[13]  Kai Yu,et al.  Robust Spoken Language Understanding with Unsupervised ASR-Error Adaptation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[15]  Björn Hoffmeister,et al.  Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding , 2017, ArXiv.

[16]  Yi Zhang,et al.  Learning to Classify Intents and Slot Labels Given a Handful of Examples , 2020, NLP4CONVAI.

[17]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Shang-Wen Li,et al.  Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Yannick Estève,et al.  Simulating ASR errors for training SLU systems , 2018, LREC.

[21]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  Shang-Wen Li,et al.  TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Sihui Wang On Modelling Uncertainty in Neural Language Generation for Policy Optimisation in Voice-Triggered Dialog Assistants , 2018 .

[25]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[27]  Yun-Nung Chen,et al.  Adapting Pretrained Transformer to Lattices for Spoken Language Understanding , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Dong Yu,et al.  An Integrative and Discriminative Technique for Spoken Utterance Classification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Yang Liu,et al.  Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words , 2015, ACL.

[31]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[32]  Shang-Wen Li,et al.  Towards Semi-Supervised Semantics Understanding from Speech , 2020, ArXiv.

[33]  Wen Wang,et al.  BERT for Joint Intent Classification and Slot Filling , 2019, ArXiv.

[34]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[35]  Shang-Wen Li,et al.  Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding , 2020, INTERSPEECH.

[36]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[37]  Spyridon Matsoukas,et al.  Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents , 2018, NAACL-HLT.

[38]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .