Frugal Prompting for Dialog Models

The use of large language models (LLMs) in natural language processing (NLP) tasks is rapidly increasing, leading to changes in how researchers approach problems in the field. To fully utilize these models' abilities, a better understanding of their behavior for different input protocols is required. With LLMs, users can directly interact with the models through a text-based interface to define and solve various tasks. Hence, understanding the conversational abilities of these LLMs, which may not have been specifically trained for dialog modeling, is also important. This study examines different approaches for building dialog systems using LLMs by considering various aspects of the prompt. As part of prompt tuning, we experiment with various ways of providing instructions, exemplars, current query and additional context. The research also analyzes the representations of dialog history that have the optimal usable-information density. Based on the findings, the paper suggests more compact ways of providing dialog history information while ensuring good performance and reducing model's inference-API costs. The research contributes to a better understanding of how LLMs can be effectively used for building interactive systems.

[1]  Noah A. Smith,et al.  Demystifying Prompts in Language Models via Perplexity Estimation , 2022, EMNLP.

[2]  Eric Michael Smith,et al.  BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.

[3]  J. Dean,et al.  Emergent Abilities of Large Language Models , 2022, Trans. Mach. Learn. Res..

[4]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[5]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[6]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[7]  Alexander M. Rush,et al.  Multitask Prompted Training Enables Zero-Shot Task Generalization , 2021, ICLR.

[8]  Pascale Fung,et al.  Few-Shot Bot: Prompt-Based Learning for Dialogue Systems , 2021, ArXiv.

[9]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[10]  Jason Weston,et al.  Beyond Goldfish Memory: Long-Term Open-Domain Conversation , 2021, ACL.

[11]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[12]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[13]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[14]  Bishal Santra,et al.  Hierarchical Transformer for Task Oriented Dialog Systems , 2020, NAACL.

[15]  Mitesh M. Khapra,et al.  Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining , 2020, Transactions of the Association for Computational Linguistics.

[16]  Yulia Tsvetkov,et al.  Controlling Dialogue Generation with Semantic Exemplars , 2020, NAACL.

[17]  Manish Gupta,et al.  Compression of Deep Learning Models for Text: A Survey , 2020, ACM Trans. Knowl. Discov. Data.

[18]  M. Zaheer,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[19]  Dawei Yin,et al.  Exemplar Guided Neural Dialogue Generation , 2020, IJCAI.

[20]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[21]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[22]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[23]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[24]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[25]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[26]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[27]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[28]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[29]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[30]  Aleksander Wawer,et al.  SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.

[31]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.

[32]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[33]  Hua Wu,et al.  PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2019, ACL.

[34]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[35]  Dilek Z. Hakkani-Tür,et al.  Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations , 2019, INTERSPEECH.

[36]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[37]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[38]  J. Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[39]  Zhoujun Li,et al.  Response Generation by Context-aware Prototype Editing , 2018, AAAI.

[40]  Yang Zhao,et al.  A Conditional Variational Framework for Dialog Generation , 2017, ACL.

[41]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[42]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[43]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[44]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[45]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[46]  Yang Liu,et al.  DialogSum: A Real-Life Scenario Dialogue Summarization Dataset , 2021, FINDINGS.