暂无分享,去创建一个
[1] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[2] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[3] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[4] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.
[5] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[6] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[7] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[8] Timothy Doster,et al. Gradual DropIn of Layers to Train Very Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[10] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[11] Franco Scarselli,et al. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[12] Benjamin Schrauwen,et al. Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.
[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[15] Jürgen Schmidhuber,et al. Recurrent Highway Networks , 2016, ICML.
[16] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[17] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[18] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.
[19] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[20] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[21] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[22] Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.
[23] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[24] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[25] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[26] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[27] Steve Renals,et al. Dynamic Evaluation of Neural Sequence Models , 2017, ICML.
[28] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[29] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.
[30] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[31] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.