Teaching Models new APIs: Domain-Agnostic Simulators for Task Oriented Dialogue

We demonstrate that large language models are able to simulate Task Oriented Dialogues in novel domains, provided only with an API implementation and a list of goals. We show these simulations can formulate online, automatic metrics that correlate well with human evaluations. Furthermore, by checking for whether the User’s goals are met, we can use simulation to repeatedly generate training data and improve the quality of simulations themselves. With no human intervention or domainspecific training data, our simulations bootstrap end-to-end models which achieve a 37% error reduction in previously unseen domains. By including as few as 32 domain-specific conversations, bootstrapped models can match the performance of a fully-supervised model with 10× more data. To our knowledge, this is the first time simulations have been shown to be effective at bootstrapping models without explicitly requiring any domain-specific training data, rule-engineering, or humans-in-the-loop.

[1]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[2]  Armen Aghajanyan,et al.  Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.

[3]  Pascale Fung,et al.  MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems , 2020, EMNLP.

[4]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[5]  Hinrich Schutze,et al.  It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[6]  Richard Socher,et al.  TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[7]  Jianfeng Gao,et al.  Microsoft Dialogue Challenge: Building End-to-End Task-Completion Dialogue Systems , 2018, ArXiv.

[8]  Jianfeng Gao,et al.  A User Simulator for Task-Completion Dialogues , 2016, ArXiv.

[9]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[10]  Zhou Yu,et al.  How to Build User Simulators to Train RL-based Dialog Systems , 2019, EMNLP.

[11]  Kam-Fai Wong,et al.  Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.

[12]  Jianfeng Gao,et al.  Multi-Domain Task-Completion Dialog Challenge , 2019 .

[13]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ArXiv.

[14]  Bing Liu,et al.  End-to-End Learning of Task-Oriented Dialogs , 2018, NAACL.

[15]  Dilek Z. Hakkani-Tür,et al.  Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems , 2021, NAACL.

[16]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[17]  Mona T. Diab,et al.  Multi-Domain Goal-Oriented Dialogues (MultiDoGO): Strategies toward Curating and Annotating Large Scale Dialogue Data , 2019, EMNLP.

[18]  Zhou Yu,et al.  Zero-Shot Dialogue State Tracking via Cross-Task Transfer , 2021, EMNLP.

[19]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[20]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[21]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[22]  Jason Weston,et al.  ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[23]  Kallirroi Georgila,et al.  Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems , 2005, SIGDIAL.

[24]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[25]  Bing Liu,et al.  Bootstrapping a Neural Conversational Agent with Dialogue Self-Play, Crowdsourcing and On-Line Reinforcement Learning , 2018, NAACL.

[26]  Gökhan Tür,et al.  Active learning for spoken language understanding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27]  Ashwin K. Vijayakumar,et al.  Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models , 2016, ArXiv.