Online learning with queries

The online learning problem requires a player to iteratively choose an action in an unknown and changing environment. In the standard setting of this problem, the player has to choose an action in each round before knowing anything about the corresponding loss. However, there are situations in which it seems possible for the player to spend efforts or resources to collect some prior information before her actions. This motivates us to study a variant of the online learning problem, in which the player is allowed to query <i>B</i> bits from the loss vector in each round before choosing her action. Suppose each loss value is represented by <i>K</i> bits and distinct loss values differ by at least some amount Δ, and suppose there are <i>N</i> actions to choose and <i>T</i> rounds to play. We provide an algorithm for this problem which achieves a regret of the following form. Before <i>B</i> approaching <i>B</i><sub>1</sub> = <i>NK</i>/2, the regret stays at <i>O</i>(√<i>T</i> ln <i>N</i>), and after <i>B</i> exceeding <i>B</i><sub>1</sub> but before approaching <i>B</i><sub>2</sub> = <i>NK</i>/2 + 3<i>K</i>/2-1, the regret drops slightly to <i>O</i>(√(<i>T</i> ln <i>N</i>)/<i>N</i>), while after <i>B</i> exceeding <i>B</i><sub>2</sub>, the regret takes a dramatic drop to (<i>N</i> ln <i>N</i>)/Δ. Our algorithm is in fact close to optimal as we also provide regret lower bounds which almost match the regret upper bounds achieved by our algorithm.

[1]  Yishay Mansour,et al.  Regret to the best vs. regret to the average , 2007, Machine Learning.

[2]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[3]  Yishay Mansour,et al.  Regret to the Best vs. Regret to the Average , 2007, COLT.

[4]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[5]  Shai Ben-David,et al.  Agnostic Online Learning , 2009, COLT.

[6]  Seshadhri Comandur,et al.  Electronic Colloquium on Computational Complexity, Report No. 88 (2007) Adaptive Algorithms for Online Decision Problems , 2022 .

[7]  Gábor Lugosi,et al.  Online Multi-task Learning with Hard Constraints , 2009, COLT.

[8]  James Aspnes,et al.  Learning Large-Alphabet and Analog Circuits with Value Injection Queries , 2007, COLT.

[9]  Manfred K. Warmuth,et al.  Online variance minimization , 2011, Machine Learning.

[10]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[11]  Y. Mansour,et al.  4 Learning , Regret minimization , and Equilibria , 2006 .

[12]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[13]  Nimrod Megiddo,et al.  Online Learning with Prior Knowledge , 2007, COLT.

[14]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[15]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[16]  Peter L. Bartlett,et al.  Multitask Learning with Expert Advice , 2007, COLT.

[17]  Sudipto Guha,et al.  Approximation algorithms for budgeted learning problems , 2007, STOC '07.

[18]  Shie Mannor,et al.  Online Learning for Global Cost Functions , 2009, COLT.

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[20]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[21]  András György,et al.  On-line Sequential Bin Packing , 2010, COLT.