论文信息 - DS-TOD: Efficient Domain Specialization for Task Oriented Dialog - 字舞流文

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for task-oriented dialog. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DOMAINCC and DOMAINREDDIT – resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters – additional parameter-light layers in which we encode the domain knowledge. Our experiments with two prominent TOD tasks – dialog state tracking (DST) and response retrieval (RR) – encompassing five domains from the MULTIWOZ TOD benchmark demonstrate the effectiveness of our domain specialization approach. Moreover, we show that the lightweight adapter-based specialization (1) performs comparably to full fine-tuning in singledomain setups and (2) is particularly suitable for multi-domain specialization, in which, besides advantageous computational footprint, it can offer better downstream performance.

Goran Glavas | Anne Lauscher | Chia-Chien Hung | Simone Paolo Ponzetto

[1] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[2] Qi Liu,et al. Pretraining the Noisy Channel Model for Task-Oriented Dialogue , 2021, Transactions of the Association for Computational Linguistics.

[3] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[4] Mihail Eric,et al. MultiWOZ 2. , 2019 .

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Richard Socher,et al. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[7] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[8] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[9] Elman Mansimov,et al. Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System , 2021, ACL.

[10] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[11] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[12] Ivan Vulić,et al. Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems , 2019, EMNLP.

[13] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.

[14] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[15] Jeremy Blackburn,et al. The Pushshift Reddit Dataset , 2020, ICWSM.

[16] David Vandyke,et al. A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[17] Jianfeng Gao,et al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[18] Jie Zhou,et al. Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks , 2021, EMNLP.

[19] Lihong Li,et al. Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[20] Matthew Henderson,et al. ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[21] Richard Socher,et al. A Simple Language Model for Task-Oriented Dialogue , 2020, NeurIPS.

[22] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24] Pawel Budzianowski,et al. Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing , 2018, ACL.

[25] Tsung-Hsien Wen,et al. Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[26] Iryna Gurevych,et al. The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes , 2020, ACL.

[27] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[28] Matthew Henderson,et al. Training Neural Response Selection for Task-Oriented Dialogue Systems , 2019, ACL.

[29] Matthew Henderson,et al. A Repository of Conversational Datasets , 2019, Proceedings of the First Workshop on NLP for Conversational AI.

[30] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[31] Zhoujun Li,et al. Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[32] Hua Wu,et al. PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2020, ACL.

[33] Iryna Gurevych,et al. BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models , 2021, NeurIPS Datasets and Benchmarks.

[34] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[35] Stefan Ultes,et al. MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[36] Baolin Peng,et al. Soloist: Building Task Bots at Scale with Transfer Learning and Machine Teaching , 2021, Transactions of the Association for Computational Linguistics.

[37] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[38] Raghav Gupta,et al. Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[39] Hongguang Li,et al. Robustness Testing of Language Understanding in Task-Oriented Dialog , 2020, ACL.

[40] Roee Aharoni,et al. Unsupervised Domain Clusters in Pretrained Language Models , 2020, ACL.

[41] Goran Glavaš,et al. XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages , 2020, COLING.

[42] Heuiseok Lim,et al. An Effective Domain Adaptive Post-Training Method for BERT in Response Selection , 2019, INTERSPEECH.