From External to Internal Regret

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm, which consistently replaces one action by another. In this paper, we give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for the internal regret problem. We provide methods that work both in the full information model, in which the loss of every action is observed at each time step, and the partial information (bandit) model, where at each time step only the loss of the selected action is observed. The importance of internal regret in game theory is due to the fact that in a general game, if each player has sublinear internal regret, then the empirical frequencies converge to a correlated equilibrium. For external regret we also derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and an arbitrary set of time selection functions (each giving different weight to each time step). The regret for a given time selection and modification rule is the difference between the cost of the online algorithm and the cost of the modified online algorithm, where the costs are weighted by the time selection function. This can be viewed as a generalization of the previously-studied sleeping experts setting.

[1]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[2]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[3]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[4]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[5]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[6]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[7]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[8]  Dean P. Foster,et al.  A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..

[9]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[10]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Yoram Singer,et al.  Learning to Query the Web , 1996 .

[13]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[14]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[15]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[16]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[17]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[18]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[19]  S. Hart,et al.  A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[20]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[21]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[22]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[23]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[24]  Nicolò Cesa-Bianchi,et al.  Potential-Based Algorithms in On-Line Prediction and Game Theory , 2003, Machine Learning.

[25]  Gábor Lugosi,et al.  Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[26]  Gilles Stoltz Incomplete information and internal regret in prediction of individual sequences , 2005 .

[27]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, ITW.

[28]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[29]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..