The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models

In this paper, we propose the new fixedsize ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. FOFE can model the word order in a sequence using a simple ordinally-forgetting mechanism according to the positions of words. In this work, we have applied FOFE to feedforward neural network language models (FNN-LMs). Experimental results have shown that without using any recurrent feedbacks, FOFE based FNNLMs can significantly outperform not only the standard fixed-input FNN-LMs but also the popular recurrent neural network (RNN) LMs.

[1]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[9]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[10]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11]  Geoffrey E. Hinton,et al.  Temporal-Kernel Recurrent Neural Networks , 2010, Neural Networks.

[12]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[14]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15]  Weiqiang Zhang,et al.  Temporal kernel neural network language model , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Yongqiang Wang,et al.  Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.

[17]  Shiliang Zhang The New HOPE Way to Learn Neural Networks , 2015 .

[18]  Shiliang Zhang,et al.  A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models , 2015, ArXiv.

[19]  Shiliang Zhang,et al.  Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks , 2015, ArXiv.