Bounds for Regret-Matching Algorithms

We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set Φ of transformations over the set of actions. Specifically, we calculate a Φ-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to the average reward that could have been obtained had the agent instead played each transformation φ ∈ Φ of its sequence of actions. The regret matching algorithms analyzed here select the agent’s next action based on the vector of Φ-regrets, along with a link function f . Many well-studied learning algorithms are seen to be instances of regret matching. We derive bounds on the regret experienced by (f,Φ)-regret matching algorithms for polynomial and exponential link functions (though we consider polynomial link functions for p > 1 rather than p ≥ 2). Although we do not improve upon the bounds reported in past work (except in special cases), our means of analysis is more general, in part because we do not rely directly on Taylor’s theorem. Hence, we can analyze algorithms based on a larger class of link functions, particularly non-differentiable link functions. In ongoing work, we are indeed studying regret matching with alternative link functions, other than polynomial and exponential.

[1]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[2]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[3]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[4]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[5]  No-Regret Algorithms for Structured Prediction Problems , 2005 .

[6]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[7]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[8]  Geoffrey J. Gordon Regret bounds for prediction problems , 1999, COLT '99.

[9]  D. Fudenberg,et al.  Conditional Universal Consistency , 1999 .

[10]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[13]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[14]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[15]  J. Shawe-Taylor Potential-Based Algorithms in On-Line Prediction and Game Theory ∗ , 2001 .

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[18]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[19]  A. Greenwald,et al.  Varieties of Regret in Online Prediction , 2004 .

[20]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .