A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this paper we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent (MD) and the Follow-the-Regularized-Leader (FTRL) algorithms. The analysis is done with extra care so as not to introduce assumptions not needed in the proofs and allows to combine, in a straightforward way, different algorithmic ideas (e.g., adaptivity, optimism, implicit updates) and learning settings (e.g., strongly convex or composite objectives). This way we are able to reprove, extend and refine a large body of the literature, while keeping the proofs concise. The second contribution is a byproduct of this careful analysis: We present algorithms with improved variational bounds for smooth, composite objectives, including a new family of optimistic MD algorithms with only one projection step per round. Furthermore, we provide a simple extension of adaptive regret bounds to practically relevant non-convex problem settings with essentially no extra effort.

[1]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[2]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[3]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[4]  K. Kiwiel Proximal Minimization Methods with Generalized Bregman Functions , 1997 .

[5]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[6]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[7]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[8]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[9]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[10]  Sham M. Kakade,et al.  Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[11]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[12]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[13]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[14]  Matthew J. Streeter,et al.  Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[15]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[16]  Peter L. Bartlett,et al.  Implicit Online Learning , 2010, ICML.

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[19]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[20]  Sanjoy Dasgupta,et al.  Agglomerative Bregman Clustering , 2012, ICML.

[21]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[22]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[23]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[24]  Prateek Jain,et al.  The Interplay Between Stability and Regret in Online Learning , 2012, ArXiv.

[25]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[26]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[27]  Koby Crammer,et al.  A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[28]  Shai Shalev-Shwartz,et al.  Beyond Convexity: Stochastic Quasi-Convex Optimization , 2015, NIPS.

[29]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[30]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[31]  Alexander J. Smola,et al.  AdaDelay: Delay Adaptive Distributed Stochastic Optimization , 2016, AISTATS.

[32]  Mehryar Mohri,et al.  Accelerating Online Convex Optimization via Adaptive Prediction , 2016, AISTATS.

[33]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[34]  Parameswaran Kamalaruban Improved Optimistic Mirror Descent for Sparsity and Curvature , 2016, ArXiv.

[35]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[36]  Paul Valiant,et al.  Optimizing Star-Convex Functions , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[37]  András György,et al.  Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms , 2016, AAAI.

[38]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[39]  Kfir Y. Levy,et al.  Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.

[40]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[41]  Tengyu Ma,et al.  Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..