论文信息 - The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models

The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models

In this paper, we propose the new fixedsize ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. FOFE can model the word order in a sequence using a simple ordinally-forgetting mechanism according to the positions of words. In this work, we have applied FOFE to feedforward neural network language models (FNN-LMs). Experimental results have shown that without using any recurrent feedbacks, FOFE based FNNLMs can significantly outperform not only the standard fixed-input FNN-LMs but also the popular recurrent neural network (RNN) LMs.

[1] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[2] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[3] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[4] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[7] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[8] T. Munich,et al. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[9] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[10] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[11] Geoffrey E. Hinton,et al. Temporal-Kernel Recurrent Neural Networks , 2010, Neural Networks.

[12] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Geoffrey Zweig,et al. Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[14] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15] Weiqiang Zhang,et al. Temporal kernel neural network language model , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] Yongqiang Wang,et al. Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.

[17] Shiliang Zhang. The New HOPE Way to Learn Neural Networks , 2015 .

[18] Shiliang Zhang,et al. A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models , 2015, ArXiv.

[19] Shiliang Zhang,et al. Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks , 2015, ArXiv.