论文信息 - DIET: Lightweight Language Understanding for Dialogue Systems - 字舞流文

DIET: Lightweight Language Understanding for Dialogue Systems

Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms fine-tuning BERT and is about six times faster to train.

Vladimir Vlasov | Alan Nichol | Daksh Varshneya | Tanja Bunk | Vladimir Vlasov | Alan Nichol | Tanja Bunk | Daksh Varshneya

[1] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[2] Takeshi Naemura,et al. Classification-Reconstruction Learning for Open-Set Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Bing Liu,et al. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[4] Sebastian Ruder,et al. Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[5] Xuanjing Huang,et al. How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[8] Tassilo Klein,et al. Attention Is (not) All You Need for Commonsense Reasoning , 2019, ACL.

[9] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[10] Matthew Henderson,et al. Training Neural Response Selection for Task-Oriented Dialogue Systems , 2019, ACL.

[11] Verena Rieser,et al. Benchmarking Natural Language Understanding Services for building Conversational Agents , 2019, IWSDS.

[12] JapkowiczNathalie,et al. The class imbalance problem: A systematic study , 2002 .

[13] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14] Xiaodong Liu,et al. Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.

[15] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[16] Oliver Lemon,et al. Hierarchical Multi-Task Natural Language Understanding for Cross-domain Conversational AI: HERMIT NLU , 2019, SIGdial.

[17] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[18] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[19] Wen Wang,et al. BERT for Joint Intent Classification and Slot Filling , 2019, ArXiv.

[20] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[21] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[22] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Jieh Hsiang,et al. PatentBERT: Patent Classification with Fine-Tuning a pre-trained BERT Model , 2019, ArXiv.

[24] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[25] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[26] Maxine Eskénazi,et al. Structured Fusion Networks for Dialog , 2019, SIGdial.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Geoffrey Zweig,et al. Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[29] Vladimir Vlasov,et al. Dialogue Transformers , 2019, ArXiv.

[30] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.

[31] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[32] George R. Doddington,et al. The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[33] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.

[34] Matthew Henderson,et al. Efficient Intent Detection with Dual Sentence Encoders , 2020, NLP4CONVAI.

[35] Niketa Gandhi,et al. Bidirectional LSTM Joint Model for Intent Classification and Named Entity Recognition in Natural Language Understanding , 2018, ISDA.

[36] Maosong Sun,et al. ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[37] Chih-Li Huo,et al. Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[38] Houfeng Wang,et al. A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[39] Nathalie Japkowicz,et al. The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[40] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[41] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[42] Jimmy J. Lin,et al. DocBERT: BERT for Document Classification , 2019, ArXiv.

[43] Meina Song,et al. A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling , 2019, ACL.