On Continuous Space Word Representations as Input of LSTM Language Model

Artificial neural networks have become the state-of-the-art in the task of language modelling whereas Long-Short Term Memory LSTM networks seem to be an efficient architecture. The continuous skip-gram and thei¾źcontinuous bag of words CBOW are algorithms for learning quality distributed vector representations that are able to capture a large number of syntactic and semantic word relationships. In this paper, we carried out experiments with a combination of these powerful models: the continuous representations of words trained with skip-gram/CBOW/GloVe method, word cache expressed as a vector using latent Dirichlet allocation LDA. These all are used on the input of LSTM network instead of 1-of-N coding traditionally used in language models. The proposed models are tested on Penn Treebank and MALACH corpus.