Language Modeling Using Part-of-speech and Long Short-Term Memory Networks

In recent years, neural networks have been widely used for language modeling in different tasks of natural language processing. Results show that long short-term memory (LSTM) neural networks are appropriate for language modeling due to their ability to process long sequences. Furthermore, many studies are shown that extra information improve language models (LMs) performance. In this research, we propose parallel structures for incorporating part-of-speech tags into language modeling task using both the unidirectional and bidirectional type of LSTMs. Words and part-of-speech tags are given to the network as parallel inputs. In this way, to concatenate these two paths, two different structures are proposed according to the type of network used in the parallel part. We analyze the efficiency on Penn Treebank (PTB) dataset using perplexity measure. These two proposed structures show improvements in comparison to the baseline models. Not only does the bidirectional LSTM method gain the lowest perplexity, but it also has the lowest training parameters among our proposed methods. The perplexity of proposed structures has reduced 1.5% and %13 for unidirectional and bidirectional LSTMs, respectively.

[1]  Chris Dyer,et al.  Document Context Language Models , 2015, ICLR 2015.

[2]  Yoshua Bengio,et al.  A Neural Knowledge Language Model , 2016, ArXiv.

[3]  Yu Wang,et al.  Future word contexts in neural network language models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[4]  Gholamreza Haffari,et al.  Incorporating Side Information into Recurrent Neural Network Language Models , 2016, NAACL.

[5]  Ebru Arisoy,et al.  Bidirectional recurrent neural network language models for automatic speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[7]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  Björn W. Schuller,et al.  Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis , 2017, EACL.

[17]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[18]  Mark J. F. Gales,et al.  Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition , 2017, INTERSPEECH.

[19]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[20]  Peter A. Heeman,et al.  POS Tagging versus Classes in Language Modeling , 1998, VLC@COLING/ACL.

[21]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[22]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[23]  Hao Wu,et al.  A Parallel Recurrent Neural Network for Language Modeling with POS Tags , 2017, PACLIC.

[24]  Vineet Padmanabhan,et al.  Multi-cell LSTM Based Neural Language Model , 2018, ArXiv.

[25]  Bhuvana Ramabhadran,et al.  Language modeling with highway LSTM , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[26]  Tara N. Sainath,et al.  Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[27]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[28]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..