A Neural Knowledge Language Model

Current language models have significant limitations in their ability to encode and decode knowledge. This is mainly because they acquire knowledge based on statistical co-occurrences, even if most of the knowledge words are rarely observed named entities. In this paper, we propose a Neural Knowledge Language Model (NKLM) which combines symbolic knowledge provided by a knowledge graph with the RNN language model. At each time step, the model predicts a fact on which the observed word is to be based. Then, a word is either generated from the vocabulary or copied from the knowledge graph. We train and test the model on a new dataset, WikiFacts. In experiments, we show that the NKLM significantly improves the perplexity while generating a much smaller number of unknown words. In addition, we demonstrate that the sampled descriptions include named entities which were used to be the unknown words in RNN language models.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Thomas Hofmann,et al.  Topic-based language models using EM , 1999, EUROSPEECH.

[5]  Blockin Blockin,et al.  Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[8]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[9]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[10]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[11]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[12]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[13]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[14]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[15]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[16]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[17]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[18]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[21]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[22]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[23]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[24]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[25]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[26]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[27]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[28]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[29]  Dilek Z. Hakkani-Tür,et al.  Enriching Word Embeddings Using Knowledge Graph for Semantic Tagging in Conversational Dialog Systems , 2015, AAAI Spring Symposia.

[30]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[31]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[32]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[33]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[34]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[37]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[38]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[39]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[40]  Jason Weston,et al.  Dialog-based Language Learning , 2016, NIPS.

[41]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[42]  Andrew McCallum,et al.  Row-less Universal Schema , 2016, AKBC@NAACL-HLT.

[43]  Jackie Chi Kit Cheung,et al.  Leveraging Lexical Resources for Learning Entity Embeddings in Multi-Relational Data , 2016, ACL.

[44]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[45]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[46]  Xin Jiang,et al.  Neural Generative Question Answering , 2015, IJCAI.

[47]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[48]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.