Augmenting Transformers with KNN-Based Composite Memory for Dialog

Various machine learning tasks can benefit from access to external information of different modalities, such as text and images. Recent work has focused on learning architectures with large memories capable of storing this knowledge. We propose augmenting generative Transformer neural networks with KNN-based Information Fetching (KIF) modules. Each KIF module learns a read operation to access fixed external knowledge. We apply these modules to generative dialog modeling, a challenging task where information must be flexibly retrieved and incorporated to maintain the topic and flow of conversation. We demonstrate the effectiveness of our approach by identifying relevant knowledge required for knowledgeable but engaging dialog from Wikipedia, images, and human-written dialog utterances, and show that leveraging this retrieved information improves model performance, measured by automatic and human evaluation.

[1]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[2]  Jason Weston,et al.  ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[3]  Haoyu Song,et al.  Generate, Delete and Rewrite: A Three-Stage Framework for Improving Persona Consistency of Dialogue Generation , 2020, ACL.

[4]  Pascal Vincent,et al.  Hierarchical Memory Networks , 2016, ArXiv.

[5]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[6]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[7]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[8]  Ming-Wei Chang,et al.  Retrieval Augmented Language Model Pre-Training , 2020, ICML.

[9]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[10]  Jason Weston,et al.  Engaging Image Chat: Modeling Personality in Grounded Dialogue , 2018, ArXiv.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[13]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[14]  Ji-Rong Wen,et al.  ReBoost: a retrieval-boosted sequence-to-sequence model for neural response generation , 2020, Information Retrieval Journal.

[15]  Stefan Roth,et al.  Neural Nearest Neighbors Networks , 2018, NeurIPS.

[16]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[17]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.

[18]  Wenlin Chen,et al.  Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[19]  Jason Weston,et al.  Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2020, ICLR.

[20]  Guillaume Lample,et al.  Large Memory Layers with Product Keys , 2019, NeurIPS.

[21]  Jason Weston,et al.  Retrieve and Refine: Improved Sequence Generation Models For Dialogue , 2018, SCAI@EMNLP.

[22]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[24]  Jason Weston,et al.  Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers , 2019, ArXiv.

[25]  Joelle Pineau,et al.  Generative Deep Neural Networks for Dialogue: A Short Review , 2016, ArXiv.

[26]  Claire Gardent,et al.  Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs , 2019, EMNLP.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[29]  Angela Fan,et al.  Controllable Abstractive Summarization , 2017, NMT@ACL.

[30]  Ali Farhadi,et al.  Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[31]  Rongzhong Lian,et al.  Learning to Select Knowledge for Response Generation in Dialog Systems , 2019, IJCAI.

[32]  Yan Wang,et al.  Retrieval-guided Dialogue Response Generation via a Matching-to-Generation Framework , 2019, EMNLP.

[33]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[34]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[35]  Xiang Li,et al.  Two are Better than One: An Ensemble of Retrieval- and Generation-Based Dialog Systems , 2016, ArXiv.

[36]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[37]  David A. Shamma,et al.  YFCC100M , 2016 .

[38]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[39]  Xiaodong Liu,et al.  Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading , 2019, ACL.

[40]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[41]  Jason Weston,et al.  Engaging Image Captioning via Personality , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Moustapha Cissé,et al.  Efficient softmax approximation for GPUs , 2016, ICML.

[43]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[44]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[45]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[46]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[47]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[48]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[49]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[50]  Guillaume Lample,et al.  Augmenting Self-attention with Persistent Memory , 2019, ArXiv.