Online Learning with a Hint

We study a variant of online linear optimization where the player receives a hint about the loss function at the beginning of each round. The hint is given in the form of a vector that is weakly correlated with the loss vector on that round. We show that the player can benefit from such a hint if the set of feasible actions is sufficiently round. Specifically, if the set is strongly convex, the hint can be used to guarantee a regret of O(log(T)), and if the set is q-uniformly convex for q\in(2,3), the hint can be used to guarantee a regret of o(sqrt{T}). In contrast, we establish Omega(sqrt{T}) lower bounds on regret when the set of feasible actions is a polyhedron.

[1]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[2]  Tor Lattimore,et al.  Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities , 2017, J. Mach. Learn. Res..

[3]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[4]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[5]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[6]  G. Pisier Martingales in Banach Spaces , 2016 .

[7]  Y. Mansour,et al.  4 Learning , Regret minimization , and Equilibria , 2006 .

[8]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[9]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[10]  Tor Lattimore,et al.  Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities , 2016, NIPS.

[11]  P. Bartlett,et al.  Optimal strategies and minimax lower bounds for online convex games [Technical Report No. UCB/EECS-2008-19] , 2008 .

[12]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[13]  Nimrod Megiddo,et al.  Online Learning with Prior Knowledge , 2007, COLT.

[14]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[15]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Patrick Jaillet,et al.  No-Regret Learnability for Piecewise Linear Losses , 2014, ArXiv.

[18]  K. Ball,et al.  Sharp uniform convexity and smoothness inequalities for trace norms , 1994 .

[19]  Elad Hazan The convex optimization approach to regret minimization , 2011 .

[20]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[21]  Chi-Jen Lu,et al.  Online learning with queries , 2010, SODA '10.

[22]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[23]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[24]  Vladimir Vovk,et al.  Competing with wild prediction rules , 2005, Machine Learning.