Online Learning with Local Permutations and Delayed Feedback

We propose an Online Learning with Local Permutations (OLLP) setting, in which the learner is allowed to slightly permute the \emph{order} of the loss functions generated by an adversary. On one hand, this models natural situations where the exact order of the learner's responses is not crucial, and on the other hand, might allow better learning and regret performance, by mitigating highly adversarial loss sequences. Also, with random permutations, this can be seen as a setting interpolating between adversarial and stochastic losses. In this paper, we consider the applicability of this setting to convex online learning with delayed feedback, in which the feedback on the prediction made in round $t$ arrives with some delay $\tau$. With such delayed feedback, the best possible regret bound is well-known to be $O(\sqrt{\tau T})$. We prove that by being able to permute losses by a distance of at most $M$ (for $M\geq \tau$), the regret can be improved to $O(\sqrt{T}(1+\sqrt{\tau^2/M}))$, using a Mirror-Descent based algorithm which can be applied for both Euclidean and non-Euclidean geometries. We also prove a lower bound, showing that for $M<\tau/3$, it is impossible to improve the standard $O(\sqrt{\tau T})$ regret bound by more than constant factors. Finally, we provide some experiments validating the performance of our algorithm.

[1]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[2]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[3]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[4]  Aleksandrs Slivkins,et al.  One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.

[5]  Ohad Shamir,et al.  On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud , 2014, ICAC.

[6]  Kent Quanrud,et al.  Online Learning with Adversarial Delays , 2015, NIPS.

[7]  Percy Liang,et al.  Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[8]  Zohar S. Karnin,et al.  Multi-armed Bandits: Competing with Optimal Sequences , 2016, NIPS.

[9]  E. Ordentlich,et al.  On delayed prediction of individual sequences , 2002, Proceedings IEEE International Symposium on Information Theory,.

[10]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[11]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[12]  Chris Mesterharm,et al.  On-line Learning with Delayed Label Feedback , 2005, ALT.

[13]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[14]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[15]  Alessandro Lazaric,et al.  Exploiting easy data in online optimization , 2014, NIPS.

[16]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[17]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[18]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..