genCNN: A Convolutional Architecture for Word Sequence Prediction

We propose a novel convolutional architecture, named $gen$CNN, for word sequence prediction. Different from previous work on neural network-based language modeling and generation (e.g., RNN or LSTM), we choose not to greedily summarize the history of words as a fixed length vector. Instead, we use a convolutional neural network to predict the next word with the history of words of variable length. Also different from the existing feedforward networks for language modeling, our model can effectively fuse the local correlation and global correlation in the word sequence, with a convolution-gating strategy specifically designed for the task. We argue that our model can give adequate representation of the history, and therefore can naturally exploit both the short and long range dependencies. Our model is fast, easy to train, and readily parallelized. Our extensive experiments on text generation and $n$-best re-ranking in machine translation show that $gen$CNN outperforms the state-of-the-arts with big margins.

[1]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Sanjeev Khudanpur,et al.  Maximum entropy language modeling with non-local dependencies , 2003 .

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[9]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[10]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[11]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[12]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[13]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[18]  David Silver,et al.  Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[19]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[20]  Geoffrey Zweig,et al.  Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.

[21]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[22]  Robert L. Mercer,et al.  An Estimate of an Upper Bound for the Entropy of English , 1992, CL.

[23]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[25]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[27]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .