暂无分享,去创建一个
Quoc V. Le | Barret Zoph | Vijay Vasudevan | Irwan Bello | Vijay Vasudevan | Barret Zoph | Irwan Bello
[1] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[2] Martin A. Riedmiller,et al. RPROP - A Fast Adaptive Learning Algorithm , 1992 .
[3] Jurgen Schmidhuber. Steps Towards 'Self-Referential' Neural Learning: A Thought Experiment ; CU-CS-627-92 , 1992 .
[4] J. Urgen Schmidhuber,et al. Steps Towards`self-referential' Neural Learning: a Thought Experiment , 1992 .
[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[6] Samy Bengio,et al. Use of genetic programming for the search of a new learning rule for neural networks , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Magnus Thor Jonsson,et al. Evolution and design of distributed learning rules , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.
[9] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[10] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[11] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[12] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[13] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[14] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[15] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[16] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[17] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[18] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[19] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[20] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[21] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[22] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[23] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[24] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.
[27] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[28] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[29] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[30] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[31] Lin Wang,et al. The evolution of a generalized neural learning rule , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).
[32] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[33] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[34] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.
[35] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[36] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[37] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[38] Jitendra Malik,et al. Learning to Optimize Neural Nets , 2017, ArXiv.
[39] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[40] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[41] Ramesh Raskar,et al. Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.
[42] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[43] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.
[44] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[45] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.
[46] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[47] Jitendra Malik,et al. Learning to Optimize , 2016, ICLR.
[48] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[49] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.