Language Modeling with Sparse Product of Sememe Experts

Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution gave textual context. Afterward, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics and offers us more powerful tools to fine-tune language models and improve the interpretability as well as the robustness of language models. Experiments on language modeling and the downstream application of headline gener- ation demonstrate the significant effect of SDLM. Source code and data used in the experiments can be accessed at https:// github.com/thunlp/SDLM-pytorch.

[1]  Xianghua Fu,et al.  Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon , 2013, Knowl. Based Syst..

[2]  Mark J. F. Gales,et al.  Product of Gaussians for speech recognition , 2006, Comput. Speech Lang..

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Zhiyuan Liu,et al.  Incorporating Chinese Characters of Words for Lexical Sememe Prediction , 2018, ACL.

[7]  Zhiyuan Liu,et al.  Neural Headline Generation with Minimum Risk Training , 2016, ArXiv.

[8]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[9]  Qiang Dong,et al.  Hownet and the Computation of Meaning: (With CD-ROM) , 2006 .

[10]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[11]  Moustapha Cissé,et al.  Unbounded cache model for online language modeling with open vocabulary , 2017, NIPS.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Imran Sarwar Bajwa,et al.  Speech Language Processing Interface for Object-Oriented Application Design using a Rule-based Frame , 2006 .

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[16]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[17]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[18]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[19]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Qingcai Chen,et al.  LCSTS: A Large Scale Chinese Short Text Summarization Dataset , 2015, EMNLP.

[22]  Zhiyuan Liu,et al.  Improved Word Representation Learning with Sememes , 2017, ACL.

[23]  Nan Jiang,et al.  Exploration of Tree-based Hierarchical Softmax for Recurrent Language Models , 2017, IJCAI.

[24]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[25]  Zhiyuan Liu,et al.  Cross-lingual Lexical Sememe Prediction , 2018, EMNLP.

[26]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[27]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[28]  Hui Wang,et al.  Word Similarity Computing Based on HowNet and Synonymy Thesaurus , 2019, IntelliSys.

[29]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[30]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[31]  Richard Socher,et al.  Revisiting Activation Regularization for Language RNNs , 2017, ArXiv.

[32]  Zhiyuan Liu,et al.  Lexical Sememe Prediction via Word Embeddings and Matrix Factorization , 2017, IJCAI.

[33]  Geoffrey E. Hinton Products of experts , 1999 .

[34]  Joshua Goodman,et al.  Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[35]  XianghuaFu,et al.  Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon , 2013 .

[36]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[37]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[38]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[39]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[40]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[41]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[42]  Maosong Sun,et al.  Neural Headline Generation with Sentence-wise Optimization , 2016 .

[43]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[44]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[45]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.