Introspection unit in memory network: Learning to generalize inference in OOV scenarios

Abstract Inference in natural language processing (NLP) is a tough task. Although plenty of models have been proposed in recent years, they are usually restricted to infer within a limited vocabulary or handcrafted training templates. In this paper, we propose the introspection unit (IU), a new neural module which can be incorporated with memory networks to deal with inference tasks in out of vocabulary (OOV) and rare named entities (RNEs) scenarios. Specifically, when encountering a new word, IU compares its part-of-speech context with the training dataset to extract a similar sample, and then embeds the new word into a target position to construct a simulated sample. The target position is located by the result of part-of-speech tagging. Finally, using the simulated sample, IU helps memory networks to learn the context and characteristic of the new word. In experiments, we evaluate the effectiveness of IU with the memory network on four inference datasets: a name OOV dataset, a place OOV dataset, a more challenging synthetical mixture OOV dataset and a realistic dialogue dataset. The experimental results demonstrate that IU effectively generalizes the inference ability of memory networks to OOV scenarios and improves the inference accuracies significantly. Furthermore, we visualize both the introspection process and the effect of IU in word embeddings and memories.

[1]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[2]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[3]  Fei Liu,et al.  Dialog state tracking, a machine reading approach using Memory Network , 2016, EACL.

[4]  Wei Chu,et al.  Modelling Domain Relationships for Transfer Learning on Retrieval-based Question Answering Systems in E-commerce , 2017, WSDM.

[5]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[6]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[7]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[8]  Ye Yuan,et al.  Words or Characters? Fine-grained Gating for Reading Comprehension , 2016, ICLR.

[9]  Cícero Nogueira dos Santos,et al.  Boosting Named Entity Recognition with Neural Character Embeddings , 2015, NEWS@ACL.

[10]  Yang Wang,et al.  Flexible and Creative Chinese Poetry Generation Using Neural Memory , 2017, ACL.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[13]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[14]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[15]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[16]  Yelong Shen,et al.  ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[17]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[18]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[19]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[20]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[21]  Li Fei-Fei,et al.  Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Ke Zhou,et al.  EFUI: An ensemble framework using uncertain inference for pornographic image recognition , 2018, Neurocomputing.

[23]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[26]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.