暂无分享,去创建一个
Quoc V. Le | Lukasz Kaiser | Ilya Sutskever | Luke Vilnis | Arvind Neelakantan | Karol Kurach | James Martens | Lukasz Kaiser | Ilya Sutskever | James Martens | Arvind Neelakantan | L. Vilnis | Karol Kurach
[1] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[2] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[3] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .
[4] Alan F. Murray,et al. Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalization and Learning Trajectory , 1992, NIPS.
[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[6] Mark Steijvers,et al. A Recurrent Network that performs a Context-Sensitive Prediction Task , 1996 .
[7] Guozhong An,et al. The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.
[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[9] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[10] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[11] Geoffrey E. Hinton,et al. Stochastic Neighbor Embedding , 2002, NIPS.
[12] H. Robbins. A Stochastic Approximation Method , 1951 .
[13] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[14] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[15] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[17] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.
[18] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[19] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[20] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[21] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[23] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[24] Yee Whye Teh,et al. Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex , 2013, NIPS.
[25] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[26] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[27] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[28] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[29] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[30] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[31] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[32] David Sussillo,et al. Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.
[33] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[34] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[35] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[36] Kam-Fai Wong,et al. Towards Neural Network-based Reasoning , 2015, ArXiv.
[37] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[38] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[39] Jason Weston,et al. Memory Networks , 2014, ICLR.
[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[41] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[42] Kaisheng Yao,et al. Depth-Gated Recurrent Neural Networks , 2015 .
[43] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[44] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[45] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[46] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[47] Lawrence Carin,et al. Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.
[48] Hossein Mobahi,et al. Training Recurrent Neural Networks by Diffusion , 2016, ArXiv.
[49] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[50] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[51] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[52] Zhe Gan,et al. Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.
[53] Marcin Andrychowicz,et al. Neural Random Access Machines , 2015, ERCIM News.
[54] Jason Weston,et al. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.
[55] Quoc V. Le,et al. Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.
[56] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[57] Ying Zhang,et al. Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.