Online Learning for Changing Environments using Coin Betting

A key challenge in online learning is that classical algorithms can be slow to adapt to changing environments. Recent studies have proposed "meta" algorithms that convert any online learning algorithm to one that is adaptive to changing environments, where the adaptivity is analyzed in a quantity called the strongly-adaptive regret. This paper describes a new meta algorithm that has a strongly-adaptive regret bound that is a factor of $\sqrt{\log(T)}$ better than other algorithms with the same time complexity, where $T$ is the time horizon. We also extend our algorithm to achieve a first-order (i.e., dependent on the observed losses) strongly-adaptive regret bound for the first time, to our knowledge. At its heart is a new parameter-free algorithm for the learning with expert advice (LEA) problem in which experts sometimes do not output advice for consecutive time steps (i.e., \emph{sleeping} experts). This algorithm is derived by a reduction from optimal algorithms for the so-called coin betting problem. Empirical results show that our algorithm outperforms state-of-the-art methods in both learning with expert advice and metric learning scenarios.

[1]  Tatiana Tommasi,et al.  Backprop without Learning Rates Through Coin Betting , 2017, NIPS 2017.

[2]  Jude W. Shavlik,et al.  Mirror Descent for Metric Learning: A Unified Approach , 2012, ECML/PKDD.

[3]  Martha White,et al.  Partition Tree Weighting , 2012, 2013 Data Compression Conference.

[4]  Koby Crammer,et al.  A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[5]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[6]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[7]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[8]  Alfred O. Hero,et al.  Dynamic metric learning from pairwise comparisons , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Seshadhri Comandur,et al.  Electronic Colloquium on Computational Complexity, Report No. 88 (2007) Adaptive Algorithms for Online Decision Problems , 2022 .

[10]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[11]  Haipeng Luo,et al.  Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.

[12]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[13]  Rong Jin,et al.  Strongly Adaptive Regret Implies Optimally Dynamic Regret , 2017, ArXiv.

[14]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[15]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[16]  Amit Daniely,et al.  Strongly Adaptive Online Learning , 2015, ICML.

[17]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2012, IEEE Trans. Inf. Theory.

[18]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[19]  Wouter M. Koolen,et al.  A Closer Look at Adaptive Regret , 2012, J. Mach. Learn. Res..

[20]  Nicolò Cesa-Bianchi,et al.  Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.