Adaptive Sequential Stochastic Optimization

A framework is introduced for sequentially solving convex stochastic minimization problems, where the objective functions change slowly, in the sense that the distance between successive minimizers is bounded. The minimization problems are solved by sequentially applying a selected optimization algorithm, such as stochastic gradient descent, based on drawing a number of samples in order to carry the iterations. Two tracking criteria are introduced to evaluate approximate minimizer quality: one based on being accurate with respect to the mean trajectory, and the other based on being accurate in high probability. An estimate of a bound on the minimizers’ change, combined with properties of the chosen optimization algorithm, is used to select the number of samples needed to meet the desired tracking criterion. A technique to estimate the change in minimizers is provided along with analysis to show that eventually the estimate upper bounds the change in minimizers. This estimate of the change in minimizers provides sample size selection rules that guarantee that the tracking criterion is met for sufficiently large number of time steps. Simulations are used to confirm that the estimation approach provides the desired tracking accuracy in practice, while being efficient in terms of number of samples used in each time step.

[1]  R. Durrett Probability: Theory and Examples , 1993 .

[2]  H. Kushner,et al.  Analysis of adaptive step size SA algorithms for parameter tracking , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[3]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[4]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[5]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[8]  Angelia Nedic,et al.  Dynamic stochastic optimization , 2014, 53rd IEEE Conference on Decision and Control.

[9]  B. Schölkopf,et al.  Convex Repeated Games and Fenchel Duality , 2007 .

[10]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[11]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[12]  R. Rockafellar,et al.  Implicit Functions and Solution Mappings: A View from Variational Analysis , 2009 .

[13]  Peter L. Bartlett,et al.  Adaptive Online Gradient Descent , 2007, NIPS.

[14]  Venugopal V. Veeravalli,et al.  Adaptive sequential optimization with applications to machine learning , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  John Kennan,et al.  Uniqueness of Positive Fixed Points for Increasing Concave Functions on Rn: An Elementary Result , 2001 .

[16]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[17]  Ali H. Sayed,et al.  Adaptive Filters , 2008 .

[18]  Y. Singer,et al.  Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[19]  Sham M. Kakade,et al.  Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[20]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[21]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[22]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[23]  Xuan Kong,et al.  Adaptive Signal Processing Algorithms: Stability and Performance , 1994 .

[24]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[25]  Jingyi Zhu,et al.  Tracking capability of stochastic gradient algorithm with constant gain , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[26]  Isao Yamada,et al.  Diffusion Least-Mean Squares With Adaptive Combiners: Formulation and Performance Analysis , 2010, IEEE Transactions on Signal Processing.

[27]  I. Yamada,et al.  The Adaptive Projected Subgradient Method over the Fixed Point Set of Strongly Attracting Nonexpansive Mappings , 2006 .

[28]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[29]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[30]  W. Rudin Principles of mathematical analysis , 1964 .

[31]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[32]  R. Giuliano Antonini,et al.  A note on the asymptotic behavior of sequences of generalized subgaussian random vectors , 2005 .

[33]  I. Yamada,et al.  Adaptive Projected Subgradient Method for Asymptotic Minimization of Sequence of Nonnegative Convex Functions , 2005 .