Memory-assisted prompt editing to improve GPT-3 after deployment

Large LMs such as GPT-3, while powerful, are not immune to mistakes, but are prohibitively costly to retrain. One failure mode is misinterpreting a user’s instruction (e.g., GPT-3 interpreting "What word is similar to good?" to mean a homonym, while the user intended a synonym). Our goal is to allow users to correct such errors directly through interaction – without retraining. Our approach pairs GPT-3 with a growing memory of cases where the model misunderstood the user’s intent and was provided with feedback, clarifying the instruction. Given a new query, our memory-enhanced GPT-3 uses feedback from similar, prior queries to enrich the prompt. Through simple proof-of-concept experiments, we show how a (simulated) user can interactively teach a deployed GPT-3, doubling its accuracy on basic lexical tasks (e.g., generate a synonym) where users query in different, novel (often misunderstood) ways. In such scenarios, memory helps avoid repeating similar past mistakes. Our simple idea is a first step towards strengthening deployed models, potentially broadening their utility.1

[1]  Roger C. Schank,et al.  Dynamic memory - a theory of reminding and learning in computers and people , 1983 .

[2]  Peter Clark,et al.  Learning Knowledge Graphs for Question Answering through Conversational Dialog , 2015, NAACL.

[3]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.

[4]  Heng Ji,et al.  Improving Question Answering with External Knowledge , 2019, EMNLP.

[5]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[6]  Daniel Khashabi,et al.  Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions , 2021, ArXiv.

[7]  Alexander M. Rush,et al.  How many data points is a prompt worth? , 2021, NAACL.

[8]  Dan Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[9]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[10]  Christopher N. Wahlheim,et al.  On the importance of looking back: The role of recursive remindings in recency judgments and cued recall , 2013, Memory & cognition.

[11]  Yejin Choi,et al.  CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning , 2020, EMNLP.

[12]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[13]  Randall Davis,et al.  Interactive Transfer of Expertise: Acquisition of New Inference Rules , 1993, IJCAI.

[14]  Ngoc Thang Vu,et al.  Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction , 2016, ACL.

[15]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ArXiv.

[16]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[17]  Christopher Riesbeck,et al.  Failure-Driven Reminding for Incremental Learning , 1981, IJCAI.

[18]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.