暂无分享,去创建一个
Stanley Osher | Minh Pham | Penghang Yin | Xiyang Luo | Bao Wang | Alex Tong Lin | S. Osher | Penghang Yin | Bao Wang | Minh Pham | Xiyang Luo | A. Lin
[1] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[2] H. Robbins. A Stochastic Approximation Method , 1951 .
[3] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[4] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[5] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[6] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[7] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[8] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[9] Stanley Osher,et al. Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means Clustering , 2017, J. Sci. Comput..
[10] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..
[11] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[12] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1997 .
[13] A. Shapiro,et al. Convergence analysis of gradient descent stochastic algorithms , 1996 .
[14] Alan L. Yuille,et al. Sobolev gradients and joint variational image segmentation, denoising, and deblurring , 2009, Electronic Imaging.
[15] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[17] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[18] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[19] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[20] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[21] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[22] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Claus Nebauer,et al. Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.
[25] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[28] R. Bhatia. Matrix Analysis , 1996 .
[29] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.
[30] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[31] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[32] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[33] Stefano Soatto,et al. Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.
[34] Kaiming He,et al. Group Normalization , 2018, ECCV.
[35] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[36] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[37] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[40] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[41] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[42] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[43] G. M.,et al. Partial Differential Equations I , 2023, Applied Mathematical Sciences.
[44] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[45] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Shai Shalev-Shwartz,et al. Fast Rates for Empirical Risk Minimization of Strict Saddle Problems , 2017, COLT.
[48] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[49] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[50] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[51] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[52] Yoshua Bengio,et al. SGD Smooths The Sharpest Directions , 2018 .