Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Most open-domain dialogue systems suffer from forgetting important information, especially in a long-term conversation. Existing works usually train the specific retriever or summarizer to obtain key information from the past, which is time-consuming and highly depends on the quality of labeled data. To alleviate this problem, we propose to recursively generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability. Specifically, our method first stimulates LLMs to memorize small dialogue contexts and then recursively produce new memory using previous memory and following contexts. Finally, the LLM can easily generate a highly consistent response with the help of the latest memory. We evaluate our method using ChatGPT and text-davinci-003, and the experiments on the widely-used public dataset show that our method can generate more consistent responses in a long-context conversation. Notably, our method is a potential solution to enable the LLM to model the extremely long context. Code and scripts will be released later.

[1]  Minjoon Seo,et al.  Effortless Integration of Memory Management into Open-Domain Conversation Systems , 2023, ArXiv.

[2]  Baolin Peng,et al.  SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting , 2023, EMNLP.

[3]  Dimitris Papailiopoulos,et al.  Prompted LLMs as Chatbot Modules for Long Open-domain Conversation , 2023, ACL.

[4]  Min Zhang,et al.  Towards Making the Most of ChatGPT for Machine Translation , 2023, EMNLP.

[5]  Dacheng Tao,et al.  Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT , 2023, ArXiv.

[6]  Hao Wu,et al.  ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark , 2023, ArXiv.

[7]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[8]  Michel Galley,et al.  Guiding Large Language Models via Directional Stimulus Prompting , 2023, ArXiv.

[9]  Juhua Liu,et al.  Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT , 2023, ArXiv.

[10]  Dan Su,et al.  A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity , 2023, IJCNLP.

[11]  Zhiwei Zeng,et al.  History-Aware Hierarchical Transformer for Multi-session Open-domain Dialogue System , 2023, EMNLP.

[12]  Min Young Lee,et al.  Keep Me Updated! Memory Management in Long-term Conversations , 2022, EMNLP.

[13]  P. Zhang,et al.  GLM-130B: An Open Bilingual Pre-trained Model , 2022, ICLR.

[14]  Hua Wu,et al.  Long Time No See! Open-Domain Conversation with Long-Term Persona Memory , 2022, FINDINGS.

[15]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[16]  Pascale Fung,et al.  Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..

[17]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[18]  Pascale Fung,et al.  Few-Shot Bot: Prompt-Based Learning for Dialogue Systems , 2021, ArXiv.

[19]  Jason Weston,et al.  Beyond Goldfish Memory: Long-Term Open-Domain Conversation , 2021, ACL.

[20]  Yu Cao,et al.  Towards Efficiently Diversifying Dialogue Generation Via Embedding Augmentation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  A. Geramifard,et al.  Memformer: A Memory-Augmented Transformer for Sequence Modeling , 2020, AACL/IJCNLP.

[22]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[23]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[24]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[25]  Antoine Bordes,et al.  Training Millions of Personalized Dialogue Agents , 2018, EMNLP.

[26]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[27]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[28]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[29]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[30]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[31]  Joewie J. Koh,et al.  Open-domain Dialogue Generation: What We Can Do, Cannot Do, And Should Do Next , 2022, NLP4CONVAI.