暂无分享,去创建一个
We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.
[1] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[2] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[3] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, ITW.
[4] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.
[5] Csaba Szepesvári,et al. Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..