Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Partial monitoring is a generalization of the well-known multi-armed bandit framework where the loss is not directly observed by the learner. We complete the classification of finite adversarial partial monitoring to include all games, solving an open problem posed by Bartok et al. [2014]. Along the way we simplify and improve existing algorithms and correct errors in previous analyses. Our second contribution is a new algorithm for the class of games studied by Bartok [2013] where we prove upper and lower regret bounds that shed more light on the dependence of the regret on the game structure.

[1]  Vianney Perchet,et al.  Approachability of Convex Sets in Games with Partial Monitoring , 2011, J. Optim. Theory Appl..

[2]  Wei Chen,et al.  Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications , 2014, ICML.

[3]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[4]  Hiroshi Nakagawa,et al.  Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring , 2015, NIPS.

[5]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[6]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[7]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[8]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[9]  Shie Mannor,et al.  Strategies for Prediction Under Imperfect Monitoring , 2007, Math. Oper. Res..

[10]  Andrew Chi-Chih Yao,et al.  Probabilistic computations: Toward a unified measure of complexity , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[11]  Shie Mannor,et al.  Set-valued approachability and online learning with partial monitoring , 2014, J. Mach. Learn. Res..

[12]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[13]  Csaba Szepesvári,et al.  Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments , 2011, COLT.

[14]  Vianney Perchet,et al.  Online Learning and Blackwell Approachability with Partial Monitoring: Optimal Convergence Rates , 2017, AISTATS.

[15]  Christian Schindelhauer,et al.  Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[16]  Shie Mannor,et al.  On-Line Learning with Imperfect Monitoring , 2003, COLT.

[17]  A. Rustichini Minimizing Regret : The General Case , 1999 .

[18]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[19]  Csaba Szepesvári,et al.  Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..

[20]  Andreas Krause,et al.  Efficient Partial Monitoring with Prior Information , 2014, NIPS.

[21]  Nicolò Cesa-Bianchi,et al.  Regret Minimization Under Partial Monitoring , 2006, ITW.

[22]  Vianney Perchet,et al.  Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms , 2011, J. Mach. Learn. Res..

[23]  Dean P. Foster,et al.  No Internal Regret via Neighborhood Watch , 2011, AISTATS.

[24]  Gábor Bartók,et al.  A near-optimal algorithm for finite partial-monitoring games against adversarial opponents , 2013, COLT.