Language Modeling with Gated Convolutional Networks
暂无分享,去创建一个
Yann Dauphin | Angela Fan | David Grangier | Michael Auli | David Grangier | Yann Dauphin | Michael Auli | Angela Fan | Y. Dauphin
[1] Martin Kay,et al. Syntactic Process , 1979, ACL.
[2] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[4] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[5] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[6] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[7] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.
[8] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.
[9] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.
[10] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[11] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[12] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[13] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[14] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[15] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[16] Dong Yu,et al. Automatic Speech Recognition: A Deep Learning Approach , 2014 .
[17] Joris Pelemans,et al. Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation , 2014, ArXiv.
[18] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[19] Qun Liu,et al. genCNN: A Convolutional Architecture for Word Sequence Prediction , 2015, ACL.
[20] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[22] Pradeep Dubey,et al. BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies , 2015, ICLR.
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[25] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[26] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[27] Wenlin Chen,et al. Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.
[28] Yann Dauphin,et al. Predicting distributions with Linearizing Belief Networks , 2016, ICLR.
[29] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[30] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[31] Boris Ginsburg,et al. Factorization tricks for LSTM networks , 2017, ICLR.
[32] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[33] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.
[34] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.