The Marginal Value of Adaptive Gradient Methods in Machine Learning
暂无分享,去创建一个
Nathan Srebro | Benjamin Recht | Mitchell Stern | Rebecca Roelofs | Ashia C. Wilson | Nathan Srebro | B. Recht | R. Roelofs | Mitchell Stern | N. Srebro
[1] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[2] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[3] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[5] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[6] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[8] Eugene Charniak,et al. Parsing as Language Modeling , 2016, EMNLP.
[9] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[10] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[11] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[14] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[16] James Cross,et al. Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.
[17] Mikhail Belkin,et al. Diving into the shallows: a computational perspective on large-scale shallow learning , 2017, NIPS.
[18] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[19] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.