Regret Minimization Under Partial Monitoring

We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for these games, that is, randomized playing strategies whose per-round regret vanishes with probability one as the number of game rounds goes to infinity. We prove a general lower bound for the convergence rate of the regret, and exhibit a specific strategy that attains this rate for any game for which a Hannan consistent player exists.

[1]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[2]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[3]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[4]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[6]  A. Banos On Pseudo-Games , 1968 .

[7]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[8]  N. Megiddo On repeated games with incomplete information played by non-Bayesian players , 1980 .

[9]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[12]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[13]  D. Fudenberg,et al.  Consistency and Cautious Fictitious Play , 1995 .

[14]  Dean P. Foster,et al.  Calibrated Learning and Correlated Equilibrium , 1997 .

[15]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[16]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[17]  David P. Helmbold,et al.  Some label efficient learning results , 1997, COLT '97.

[18]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[19]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .

[20]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .

[21]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[22]  A. Rustichini Minimizing Regret : The General Case , 1999 .

[23]  Dean P. Foster,et al.  Regret in the On-Line Decision Problem , 1999 .

[24]  Philip M. Long,et al.  Apple Tasting , 2000, Inf. Comput..

[25]  V. Vovk Competitive On‐line Statistics , 2001 .

[26]  Andreu Mas-Colell,et al.  A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[27]  S. Hart,et al.  A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[28]  Tsachy Weissman,et al.  Twofold universal prediction schemes for achieving the finite-state predictability of a noisy individual binary sequence , 2001, IEEE Trans. Inf. Theory.

[29]  Tsachy Weissman,et al.  Universal prediction of individual binary sequences in the presence of noise , 2001, IEEE Trans. Inf. Theory.

[30]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[31]  J. Shawe-Taylor Potential-Based Algorithms in On-Line Prediction and Game Theory ∗ , 2001 .

[32]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[33]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[34]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[35]  Microeconomics-Charles W. Upton Repeated games , 2020, Game Theory.

[36]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[37]  Shie Mannor,et al.  On-Line Learning with Imperfect Monitoring , 2003, COLT.

[38]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[39]  Nicolò Cesa-Bianchi,et al.  Potential-Based Algorithms in On-Line Prediction and Game Theory , 2003, Machine Learning.

[40]  Avrim Blum,et al.  Near-optimal online auctions , 2005, SODA '05.

[41]  Gábor Lugosi,et al.  Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[42]  Gábor Lugosi,et al.  Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[43]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[44]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[45]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .