论文信息 - Regret Minimization Under Partial Monitoring

Regret Minimization Under Partial Monitoring

We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for these games, that is, randomized playing strategies whose per-round regret vanishes with probability one as the number of game rounds goes to infinity. We prove a general lower bound for the convergence rate of the regret, and exhibit a specific strategy that attains this rate for any game for which a Hannan consistent player exists.

[1] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[2] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[3] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[4] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[6] A. Banos. On Pseudo-Games , 1968 .

[7] D. Freedman. On Tail Probabilities for Martingales , 1975 .

[8] N. Megiddo. On repeated games with incomplete information played by non-Bayesian players , 1980 .

[9] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[10] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[11] Neri Merhav,et al. Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[12] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[13] D. Fudenberg,et al. Consistency and Cautious Fictitious Play , 1995 .

[14] Dean P. Foster,et al. Calibrated Learning and Correlated Equilibrium , 1997 .

[15] Manfred K. Warmuth,et al. How to use expert advice , 1997, JACM.

[16] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[17] David P. Helmbold,et al. Some label efficient learning results , 1997, COLT '97.

[18] Neri Merhav,et al. Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[19] G. Lugosi,et al. On Prediction of Individual Sequences , 1998 .

[20] G. Lugosi,et al. On Prediction of Individual Sequences , 1998 .

[21] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[22] A. Rustichini. Minimizing Regret : The General Case , 1999 .

[23] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[24] Philip M. Long,et al. Apple Tasting , 2000, Inf. Comput..

[25] V. Vovk. Competitive On‐line Statistics , 2001 .

[26] Andreu Mas-Colell,et al. A General Class of Adaptive Strategies , 1999, J. Econ. Theory.

[27] S. Hart,et al. A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[28] Tsachy Weissman,et al. Twofold universal prediction schemes for achieving the finite-state predictability of a noisy individual binary sequence , 2001, IEEE Trans. Inf. Theory.

[29] Tsachy Weissman,et al. Universal prediction of individual binary sequences in the presence of noise , 2001, IEEE Trans. Inf. Theory.

[30] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[31] J. Shawe-Taylor. Potential-Based Algorithms in On-Line Prediction and Game Theory ∗ , 2001 .

[32] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[33] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[34] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[35] Microeconomics-Charles W. Upton. Repeated games , 2020, Game Theory.

[36] Vijay Kumar,et al. Online learning in online auctions , 2003, SODA '03.

[37] Shie Mannor,et al. On-Line Learning with Imperfect Monitoring , 2003, COLT.

[38] Frank Thomson Leighton,et al. The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[39] Nicolò Cesa-Bianchi,et al. Potential-Based Algorithms in On-Line Prediction and Game Theory , 2003, Machine Learning.

[40] Avrim Blum,et al. Near-optimal online auctions , 2005, SODA '05.

[41] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.

[42] Gábor Lugosi,et al. Internal Regret in On-Line Portfolio Selection , 2005, Machine Learning.

[43] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[44] Yishay Mansour,et al. From External to Internal Regret , 2005, J. Mach. Learn. Res..

[45] P. Massart,et al. Concentration inequalities and model selection , 2007 .