Opportunistic Approachability and Generalized No-Regret Problems

Blackwell’s theory of approachability, introduced in 1956, has since proved a useful tool in the study of a range of repeated multiagent decision problems. Given a repeated matrix game with vector payoffs, a target set S is approachable by a certain player if he can ensure that the average payoff vector converges to that set, for any strategy of the opponent. In this paper we consider the case where a set need not be approachable in general, but may be approached if the opponent played favorably in some sense. In particular, we consider nonconvex sets that satisfy Blackwell’s dual condition, namely, can be approached when the opponent plays a stationary strategy. Whereas the convex hull of such a set is approachable, this is not generally the case for the original nonconvex set itself. We start by defining a sense of restricted play of the opponent (with stationary strategies being a special case), and then formulate appropriate goals for an opportunistic approachability algorithm that can take advantage ...

[1]  Ambuj Tewari,et al.  Complexity-Based Approach to Calibration with Checking Rules , 2011, COLT.

[2]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[3]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[4]  Shie Mannor,et al.  A Geometric Proof of Calibration , 2009, Math. Oper. Res..

[5]  Alan Hutchinson,et al.  Algorithmic Learning , 1994 .

[6]  M. Cripps The theory of learning in games. , 1999 .

[7]  D. Blackwell Controlled Random Walks , 2010 .

[8]  Aleksandrs Slivkins,et al.  25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .

[9]  Vianney Perchet,et al.  On an unified framework for approachability in games with or without signals , 2013, ArXiv.

[10]  Ambuj Tewari,et al.  Online Learning: Beyond Regret , 2010, COLT.

[11]  T. Hou Approachability in a Two-person Game , 1971 .

[12]  Shie Mannor,et al.  Regret minimization in repeated matrix games with variable stage duration , 2008, Games Econ. Behav..

[13]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .

[14]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[15]  Shie Mannor,et al.  The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes , 2003, Math. Oper. Res..

[16]  Xavier Spinat,et al.  A Necessary and Sufficient Condition for Approachability , 2002, Math. Oper. Res..

[17]  Emanuel Milman Approachable sets of vector payoffs in stochastic games , 2006, Games Econ. Behav..

[18]  Shie Mannor,et al.  Online calibrated forecasts: Memory efficiency versus universality for learning in games , 2006, Machine Learning.

[19]  Peter L. Bartlett,et al.  Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[20]  Vianney Perchet,et al.  Calibration and Internal no-Regret with Partial Monitoring , 2010, ArXiv.

[21]  Shie Mannor,et al.  Online Classification with Specificity Constraints , 2010, NIPS.

[22]  E. Kalai,et al.  Calibrated Forecasting and Merging , 1999 .

[23]  Zdzisław Denkowski,et al.  Set-Valued Analysis , 2021 .

[24]  Ward Whitt,et al.  Uniform conditional variability ordering of probability distributions , 1985 .

[25]  C. D. Meyer Sensitivity of the Stationary Distribution of a Markov Chain , 1994, SIAM J. Matrix Anal. Appl..

[26]  Ehud Lehrer,et al.  Author's Personal Copy Games and Economic Behavior Approachability with Bounded Memory , 2022 .

[27]  E. Michael Continuous Selections. I , 1956 .

[28]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[29]  E. Lehrer,et al.  Learning to play partially-specified equilibrium , 2007 .

[30]  Vianney Perchet,et al.  Calibration and Internal No-Regret with Random Signals , 2009, ALT.

[31]  John N. Tsitsiklis,et al.  Online Learning with Sample Path Constraints , 2009, J. Mach. Learn. Res..

[32]  Shie Mannor,et al.  Robust approachability and regret minimization in games with partial monitoring , 2011, COLT.

[33]  Dario Bauso,et al.  Repeated games over networks with vector payoffs: the notion of attainability , 2011, International Conference on NETwork Games, Control and Optimization (NetGCooP 2011).

[34]  Andrey Bernstein,et al.  Response-Based Approachability and its Application to Generalized No-Regret Algorithms , 2013, ArXiv.

[35]  A. Dawid Comment: The Impossibility of Inductive Inference , 1985 .

[36]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[37]  Ehud Lehrer,et al.  A General Internal Regret-Free Strategy , 2016, Dyn. Games Appl..

[38]  A. Shwartz,et al.  Guaranteed performance regions in Markovian systems with competing decision makers , 1993, IEEE Trans. Autom. Control..

[39]  S. Hart,et al.  A General Class of Adaptive Strategies , 1999 .

[40]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[41]  Ehud Lehrer,et al.  Approachability in infinite dimensional spaces , 2003, Int. J. Game Theory.

[42]  Shie Mannor,et al.  Online Learning for Global Cost Functions , 2009, COLT.

[43]  Sham M. Kakade,et al.  (weak) Calibration is Computationally Hard , 2012, COLT.