论文信息 - Online Learning with Predictable Sequences

Online Learning with Predictable Sequences

We present methods for online linear optimization that take advantage of benign (as opposed to worst-case) sequences. Specically if the sequence encountered by the learner is described well by a known \predictable process", the algorithms presented enjoy tighter bounds as compared to the typical worst case bounds. Additionally, the methods achieve the usual worst-case regret bounds if the sequence is not benign. Our approach can be seen as a way of adding prior knowledge about the sequence within the paradigm of online learning. The setting is shown to encompass partial and side information. Variance and path-length bounds [11, 9] can be seen as particular examples of online learning with simple predictable sequences. We further extend our methods and results to include competing with a set of possible predictable processes (models), that is \learning" the predictable process itself concurrently with using it to obtain better regret guarantees. We show that such model selection is possible under various assumptions on the available feedback. Our results suggest a promising direction of further research with potential applications to stock market and time series prediction.

Karthik Sridharan | Alexander Rakhlin | Karthik Sridharan | A. Rakhlin

[1] Elad Hazan,et al. Interior-Point Methods for Full-Information and Bandit Online Learning , 2012, IEEE Transactions on Information Theory.

[2] Rong Jin,et al. 25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[3] Elad Hazan,et al. Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[4] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[5] Ohad Shamir,et al. Relax and Localize: From Value to Algorithms , 2012, ArXiv.

[6] Ambuj Tewari,et al. Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.

[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8] Elad Hazan,et al. Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[9] Peter L. Bartlett,et al. Adaptive Online Gradient Descent , 2007, NIPS.

[10] A. Nemirovski,et al. Interior-point methods for optimization , 2008, Acta Numerica.

[11] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[12] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[13] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[14] Ambuj Tewari,et al. Online Learning: Stochastic, Constrained, and Smoothed Adversaries , 2011, NIPS.

[15] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.