暂无分享,去创建一个
Michael W. Mahoney | Majid Jahani | Martin Tak'avc | Peter Richt'arik | Sergey Rusakov | Zheng Shi | Peter Richtárik | Majid Jahani | Martin Tak'avc | S. Rusakov | Zheng Shi
[1] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[2] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[3] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[4] Kurt Keutzer,et al. ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning , 2020, AAAI.
[5] Yura Malitsky,et al. Adaptive gradient descent without descent , 2019, ICML.
[6] Albert S. Berahas,et al. Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample , 2019, ArXiv.
[7] Peter Richtárik,et al. New Convergence Aspects of Stochastic Gradient Algorithms , 2018, J. Mach. Learn. Res..
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[10] Albert S. Berahas,et al. Scaling Up Quasi-newton Algorithms: Communication Efficient Distributed SR1 , 2019, LOD.
[11] Albert S. Berahas,et al. SONIA: A Symmetric Blockwise Truncated Optimization Algorithm , 2020, AISTATS.
[12] Erik Meijer,et al. Gradient Descent: The Ultimate Optimizer , 2019, ArXiv.
[13] Aryan Mokhtari,et al. A Newton-Based Method for Nonconvex Optimization with Fast Evasion of Saddle Points , 2017, SIAM J. Optim..
[14] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[15] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.
[16] Peng Xu,et al. Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study , 2017, SDM.
[17] Aryan Mokhtari,et al. Efficient Distributed Hessian Free Algorithm for Large-scale Empirical Risk Minimization via Accumulating Sample Strategy , 2018, AISTATS.
[18] Sharan Vaswani,et al. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.
[19] Peng Xu,et al. Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..
[20] Michael W. Mahoney,et al. Sub-sampled Newton methods , 2018, Math. Program..
[21] Jorge Nocedal,et al. A Multi-Batch L-BFGS Method for Machine Learning , 2016, NIPS.
[22] J. Nocedal,et al. Exact and Inexact Subsampled Newton Methods for Optimization , 2016, 1609.08502.
[23] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[24] Yang Liu,et al. Newton-MR: Newton's Method Without Smoothness or Convexity , 2018, ArXiv.
[25] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[26] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[28] Mark W. Schmidt,et al. Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.
[29] Frank E. Curtis,et al. A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization , 2016, ICML.
[30] R. Fletcher. Practical Methods of Optimization , 1988 .
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Y. Saad,et al. An estimator for the diagonal of a matrix , 2007 .
[33] H. Robbins. A Stochastic Approximation Method , 1951 .
[34] Michael W. Mahoney,et al. PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).
[35] Peng Xu,et al. Inexact Non-Convex Newton-Type Methods , 2018, 1802.06925.
[36] Aryan Mokhtari,et al. Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..
[37] Stephen J. Wright,et al. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .
[38] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[39] Jorge Nocedal,et al. An investigation of Newton-Sketch and subsampled Newton methods , 2017, Optim. Methods Softw..
[40] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[41] J. J. Moré,et al. Quasi-Newton Methods, Motivation and Theory , 1974 .
[42] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[43] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[44] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[45] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.