论文信息 - A Recurrent Neural Network Language Model Based on Word Embedding

A Recurrent Neural Network Language Model Based on Word Embedding

Language model is one of the basic research issues of natural language processing, and which is the premise for realizing more complicated tasks such as speech recognition, machine translation and question answering system. In recent years, neural network language model has become a research hotspot, which greatly enhances the application effect of language model. In this paper, a recurrent neural network language model (RNNLM) based on word embedding is proposed, and the word embedding of each word is generated by pre-training the text data with skip-gram model. The n-gram language model, RNNLM based on one-hot and RNNLM based on word embedding are evaluated on three different public datasets. The experimental results show that the RNNLM based on word embedding performs best, and which can reduce the perplexity of language model significantly.

Jungang Xu | Shuaimin Li

[1] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3] Dietrick E. Thomsen. The Beauty of Mathematics , 1973 .

[4] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[5] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Geoffrey E. Hinton,et al. Learning distributed representations of concepts. , 1989 .

[7] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[10] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.