A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization
暂无分享,去创建一个
Tong Zhang | Minghan Yang | Andre Milzarek | Zaiwen Wen | Tong Zhang | Z. Wen | A. Milzarek | Minghan Yang | Zaiwen Wen
[1] Yunda Dong. An extension of Luque's growth condition , 2009, Appl. Math. Lett..
[2] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[3] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..
[4] Michael Ulbrich,et al. A Stochastic Semismooth Newton Method for Nonsmooth Nonconvex Optimization , 2018, SIAM J. Optim..
[5] J. Nocedal. Updating Quasi-Newton Matrices With Limited Storage , 1980 .
[6] Saeed Ghadimi,et al. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.
[7] Ambuj Tewari,et al. Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.
[8] Jong-Shi Pang,et al. Nonsmooth Equations: Motivation and Algorithms , 1993, SIAM J. Optim..
[9] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[10] Defeng Sun,et al. Newton and Quasi-Newton Methods for a Class of Nonsmooth Equations and Related Problems , 1997, SIAM J. Optim..
[11] Yongfeng Li,et al. A Regularized Semi-Smooth Newton Method with Projection Steps for Composite Convex Programs , 2016, J. Sci. Comput..
[12] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[13] Lam M. Nguyen,et al. ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..
[14] Liqun Qi,et al. A nonsmooth version of Newton's method , 1993, Math. Program..
[15] Suvrit Sra,et al. Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.
[16] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..
[17] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[18] Liqun Qi,et al. On superlinear convergence of quasi-Newton methods for nonsmooth equations , 1997, Oper. Res. Lett..
[19] Anthony Man-Cho So,et al. Quadratic optimization with orthogonality constraint: explicit Łojasiewicz exponent and linear convergence of retraction-based line-search and stochastic variance-reduced gradient methods , 2018, Mathematical Programming.
[20] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[21] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[22] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[23] H. Robbins. A Stochastic Approximation Method , 1951 .
[24] Jialei Wang,et al. Utilizing Second Order Information in Minibatch Stochastic Variance Reduced Proximal Iterations , 2019, J. Mach. Learn. Res..
[25] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[26] Aryan Mokhtari,et al. IQN: An Incremental Quasi-Newton Method with Local Superlinear Convergence Rate , 2017, SIAM J. Optim..
[27] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[28] Jingwei Liang,et al. Local Convergence Properties of SAGA/Prox-SVRG and Acceleration , 2018, ICML.
[29] Peng Xu,et al. Newton-type methods for non-convex optimization under inexact Hessian information , 2017, Math. Program..
[30] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[31] Wotao Yin,et al. Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..
[32] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.
[33] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[34] Vincent Y. F. Tan,et al. Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies , 2017, IEEE Transactions on Signal Processing.
[35] A. Conv. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016 .
[36] Satoshi Matsuoka,et al. Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] J. Moreau. Proximité et dualité dans un espace hilbertien , 1965 .
[38] Naman Agarwal,et al. Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..
[39] Quanquan Gu,et al. Stochastic Nested Variance Reduction for Nonconvex Optimization , 2018, J. Mach. Learn. Res..
[40] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.
[41] A. Willsky,et al. Sparse and low-rank matrix decompositions , 2009 .
[42] Nicholas I. M. Gould,et al. Trust Region Methods , 2000, MOS-SIAM Series on Optimization.
[43] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..
[44] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[45] Yin Zhang,et al. A Fast Algorithm for Sparse Reconstruction Based on Shrinkage, Subspace Optimization, and Continuation , 2010, SIAM J. Sci. Comput..
[46] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[47] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[48] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[49] Patrick L. Combettes,et al. Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..
[50] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[51] Xiaojun Chen,et al. A parameterized Newton method and a quasi-Newton method for nonsmooth equations , 1994, Comput. Optim. Appl..
[52] Michael I. Jordan,et al. A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.
[53] Yi Zhou,et al. SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms , 2018 .
[54] Marten van Dijk,et al. Finite-sum smooth optimization with SARAH , 2019, Computational Optimization and Applications.
[55] Martin J. Wainwright,et al. Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..
[56] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[57] Aryan Mokhtari,et al. Global convergence of online limited memory BFGS , 2014, J. Mach. Learn. Res..
[58] Ya-Xiang Yuan,et al. Stochastic proximal quasi-Newton methods for non-convex composite optimization , 2019, Optim. Methods Softw..
[59] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.
[60] Jie Liu,et al. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.
[61] Christian Kirches,et al. An SR1/BFGS SQP algorithm for nonconvex nonlinear programs with block-diagonal Hessian matrix , 2016, Math. Program. Comput..
[62] Liqun Qi,et al. Convergence Analysis of Some Algorithms for Solving Nonsmooth Equations , 1993, Math. Oper. Res..
[63] Dmitriy Drusvyatskiy,et al. Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..
[64] Changfeng Ma,et al. A globally and superlinearly convergent quasi-Newton method for general box constrained variational inequalities without smoothing approximation , 2011, J. Glob. Optim..
[65] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..
[66] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[67] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[68] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[69] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.
[70] Alexander J. Smola,et al. Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization , 2016, NIPS.
[71] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[72] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[73] Peng Xu,et al. Sub-sampled Newton Methods with Non-uniform Sampling , 2016, NIPS.
[74] Yi Zhou,et al. SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.
[75] Aryan Mokhtari,et al. RES: Regularized Stochastic BFGS Algorithm , 2014, IEEE Transactions on Signal Processing.
[76] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[77] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[78] Panagiotis Patrinos,et al. Forward-Backward Envelope for the Sum of Two Nonconvex Functions: Further Properties and Nonmonotone Linesearch Algorithms , 2016, SIAM J. Optim..
[79] Julien Mairal,et al. Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..
[80] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.
[81] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[82] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[83] Patrick L. Combettes,et al. Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.
[84] Bruce W. Suter,et al. Extragradient Method in Optimization: Convergence and Complexity , 2016, J. Optim. Theory Appl..
[85] Pradeep Ravikumar,et al. QUIC: quadratic approximation for sparse inverse covariance estimation , 2014, J. Mach. Learn. Res..
[86] Michael A. Saunders,et al. Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..
[87] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[88] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.
[89] Cho-Jui Hsieh,et al. Fast Variance Reduction Method with Stochastic Batch Size , 2018, ICML.
[90] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[91] Shiqian Ma,et al. Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization , 2014, SIAM J. Optim..
[92] Dmitriy Drusvyatskiy,et al. Stochastic subgradient method converges at the rate O(k-1/4) on weakly convex functions , 2018, ArXiv.
[93] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..
[94] Jorge Nocedal,et al. A Multi-Batch L-BFGS Method for Machine Learning , 2016, NIPS.
[95] Andrea Montanari,et al. Convergence rates of sub-sampled Newton methods , 2015, NIPS.
[96] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[97] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..
[98] Aurélien Lucchi,et al. Sub-sampled Cubic Regularization for Non-convex Optimization , 2017, ICML.
[99] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.
[100] R. Anton,et al. A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums , 2016 .
[101] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[102] Haishan Ye,et al. Approximate Newton Methods and Their Local Convergence , 2017, ICML.
[103] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..
[104] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[105] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[106] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[107] Mojmir Mutny,et al. Stochastic Second-Order Optimization via von Neumann Series , 2016, ArXiv.
[108] Michael W. Mahoney,et al. Sub-sampled Newton methods , 2018, Math. Program..
[109] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..
[110] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[111] A. Bemporad,et al. Forward-backward truncated Newton methods for convex composite optimization , 2014, 1402.6655.
[112] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[113] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[114] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[115] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent in Function Space , 2007 .
[116] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.
[117] Rujie Liu,et al. Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent , 2015, ECML/PKDD.
[118] Panagiotis Patrinos,et al. Forward–backward quasi-Newton methods for nonsmooth optimization problems , 2016, Computational Optimization and Applications.
[119] Jorge Nocedal,et al. An investigation of Newton-Sketch and subsampled Newton methods , 2017, Optim. Methods Softw..
[120] Alfredo N. Iusem,et al. Extragradient Method with Variance Reduction for Stochastic Variational Inequalities , 2017, SIAM J. Optim..
[121] Robert M. Gower,et al. Stochastic Block BFGS: Squeezing More Curvature out of Data , 2016, ICML.
[122] J. Nocedal,et al. Exact and Inexact Subsampled Newton Methods for Optimization , 2016, 1609.08502.