Few-Shot Dialogue Generation Without Annotated Data: A Transfer Learning Approach

Learning with minimal data is one of the key challenges in the development of practical, production-ready goal-oriented dialogue systems. In a real-world enterprise setting where dialogue systems are developed rapidly and are expected to work robustly for an ever-growing variety of domains, products, and scenarios, efficient learning from a limited number of examples becomes indispensable. In this paper, we introduce a technique to achieve state-of-the-art dialogue generation performance in a few-shot setup, without using any annotated data. We do this by leveraging background knowledge from a larger, more highly represented dialogue source --- namely, the MetaLWOz dataset. We evaluate our model on the Stanford Multi-Domain Dialogue Dataset, consisting of human-human goal-oriented dialogues in in-car navigation, appointment scheduling, and weather information domains. We show that our few-shot approach achieves state-of-the art results on that dataset by consistently outperforming the previous best model in terms of BLEU and Entity F1 scores, while being more data-efficient by not requiring any data annotation.

[1]  Oliver Lemon,et al.  Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena , 2017, ArXiv.

[2]  Oliver Lemon,et al.  Bootstrapping incremental dialogue systems from minimal data: the generalisation power of dialogue grammars , 2017, EMNLP.

[3]  Verena Rieser,et al.  Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.

[4]  Oliver Lemon,et al.  Bootstrapping incremental dialogue systems: using linguistic knowledge to learn from minimal data , 2016, NIPS 2016.

[5]  Stefan Ultes,et al.  Addressing Objects and Their Relations: The Conversational Entity Dialogue Model , 2018, SIGDIAL Conference.

[6]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[7]  Maxine Eskénazi,et al.  Zero-Shot Dialog Generation with Cross-Domain Latent Actions , 2018, SIGDIAL Conference.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  Jianfeng Gao,et al.  Multi-Domain Task-Completion Dialog Challenge , 2019 .

[13]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[14]  Arash Eshghi,et al.  DyLan : Parser for Dynamic Syntax , 2013 .

[15]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[16]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Roi Blanco,et al.  Lightweight Multilingual Entity Extraction and Linking , 2017, WSDM.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Tsung-Hsien Wen,et al.  Latent Intention Dialogue Models , 2017, ICML.

[22]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.