论文信息 - Efficient Partial Monitoring with Prior Information

Efficient Partial Monitoring with Prior Information

Partial monitoring is a general model for online learning with limited feedback: a learner chooses actions in a sequential manner while an opponent chooses outcomes. In every round, the learner suffers some loss and receives some feedback based on the action and the outcome. The goal of the learner is to minimize her cumulative loss. Applications range from dynamic pricing to label-efficient prediction to dueling bandits. In this paper, we assume that we are given some prior information about the distribution based on which the opponent generates the outcomes. We propose BPM, a family of new efficient algorithms whose core is to track the outcome distribution with an ellipsoid centered around the estimated distribution. We show that our algorithm provably enjoys near-optimal regret rate for locally observable partial-monitoring problems against stochastic opponents. As demonstrated with experiments on synthetic as well as real-world data, the algorithm outperforms previous approaches, even for very uninformed priors, with an order of magnitude smaller regret and lower running time.

Andreas Krause | Gábor Bartók | Hastagiri P. Vanchinathan

[1] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[2] Thorsten Joachims,et al. Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.

[3] Andreas Krause,et al. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[4] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[5] Nicolò Cesa-Bianchi,et al. Regret Minimization Under Partial Monitoring , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[6] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[7] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[8] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[9] Gábor Bartók,et al. A near-optimal algorithm for finite partial-monitoring games against adversarial opponents , 2013, COLT.

[10] Csaba Szepesvári,et al. An adaptive algorithm for finite stochastic partial monitoring , 2012, ICML.

[11] Christian Schindelhauer,et al. Discrete Prediction Games with Arbitrary Feedback and Loss , 2001, COLT/EuroCOLT.

[12] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.

[13] Csaba Szepesvári,et al. Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments , 2011, COLT.