论文信息 - Online Learning with Imperfect Hints - 字舞流文

Online Learning with Imperfect Hints

We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint vector is guaranteed to have a positive correlation with the cost vector, then the online player can achieve a regret of $O(\log T)$, thus significantly improving over the $O(\sqrt{T})$ regret in the general setting. However, the result and analysis require the correlation property at \emph{all} time steps, thus raising the natural question: can we design online learning algorithms that are resilient to bad hints? In this paper we develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case. Our results also generalize, simplify, and improve upon previous results on optimistic regret bounds, which can be viewed as an additive version of hints.

Aditya Bhaskara | Manish Purohit | Ravi Kumar | Ashok Cutkosky

[1] Russ Bubley,et al. Randomized algorithms , 1995, CSUR.

[2] G. Pisier. Martingales in Banach Spaces , 2016 .

[3] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.

[4] Mehryar Mohri,et al. Parameter-Free Online Learning via Model Selection , 2017, NIPS.

[5] Patrick Jaillet,et al. Online Learning with a Hint , 2017, NIPS.

[6] Percy Liang,et al. Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[7] Karthik Sridharan,et al. Online Learning: Sufficient Statistics and the Burkholder Method , 2018, COLT.

[8] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[9] Ravi Kumar,et al. Semi-Online Bipartite Matching , 2018, ITCS.

[10] Dhruv Rohatgi,et al. Near-Optimal Bounds for Online Caching with Machine Learned Advice , 2019, SODA.

[11] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[12] Francesco Orabona,et al. Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.

[13] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.

[14] Francesco Orabona,et al. Parameter-free Online Convex Optimization with Sub-Exponential Noise , 2019, COLT.

[15] Manish Purohit,et al. Interleaved Caching with Access Graphs , 2020, SODA.

[16] Elad Hazan,et al. Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[17] Peter L. Bartlett,et al. Adaptive Online Gradient Descent , 2007, NIPS.

[18] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[19] Nimrod Megiddo,et al. Online Learning with Prior Knowledge , 2007, COLT.

[20] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[21] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[22] Dirk van der Hoeven. User-Specified Local Differential Privacy in Unconstrained Adaptive Online Learning , 2019, NeurIPS.

[23] Sergei Vassilvitskii,et al. Competitive caching with machine learned advice , 2018, ICML.

[24] Mehryar Mohri,et al. Accelerating Online Convex Optimization via Adaptive Prediction , 2016, AISTATS.

[25] Wojciech Kotlowski,et al. Adaptive scale-invariant online algorithms for learning linear models , 2019, ICML.

[26] Francesco Orabona,et al. Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[27] Silvio Lattanzi,et al. Online Scheduling via Learned Weights , 2020, SODA.

[28] Ashok Cutkosky,et al. Combining Online Learning Guarantees , 2019, COLT.

[29] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[30] Francesco Orabona,et al. Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations , 2014, COLT.

[31] H. Brendan McMahan,et al. A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[32] Ashok Cutkosky,et al. Matrix-Free Preconditioning in Online Learning , 2019, ICML.