论文信息 - Probing Task-Oriented Dialogue Representation from Language Models - 字舞流文

Probing Task-Oriented Dialogue Representation from Language Models

This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks. We approach the problem from two aspects: supervised classifier probe and unsupervised mutual information probe. We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way. Meanwhile, we propose an unsupervised mutual information probe to evaluate the mutual dependence between a real clustering and a representation clustering. The goals of this empirical paper are to 1) investigate probing techniques, especially from the unsupervised mutual information aspect, 2) provide guidelines of pre-trained language model selection for the dialogue research community, 3) find insights of pre-training factors for dialogue application that may be the key to success.

Caiming Xiong | Chien-Sheng Wu | Caiming Xiong | Chien-Sheng Wu

[1] Richard Socher,et al. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[2] Ryan Cotterell,et al. Information-Theoretic Probing for Linguistic Structure , 2020, ACL.

[3] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[5] Richard Socher,et al. Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[6] Richard Socher,et al. Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[7] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[8] Matthew Henderson,et al. ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[9] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[10] Thomas Wolf,et al. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[11] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[12] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[13] Hannes Schulz,et al. A Frame Tracking Model for Memory-Enhanced Dialogue Systems , 2017, Rep4NLP@ACL.

[14] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[15] David Vandyke,et al. A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[16] Hua Wu,et al. PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2020, ACL.

[17] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[18] Pascale Fung,et al. MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems , 2020, EMNLP.

[19] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[20] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[21] Douglas A. Reynolds,et al. Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[22] Stefan Ultes,et al. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[23] Xuanjing Huang,et al. Task-oriented Dialogue System for Automatic Diagnosis , 2018, ACL.

[24] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[25] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[26] James Bailey,et al. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[27] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[28] Ivan Vulić,et al. Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems , 2019, EMNLP.

[29] Lingjia Tang,et al. An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , 2019, EMNLP.

[30] Jianfeng Gao,et al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[31] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.