论文信息 - Multi-armed Bandit with Additional Observations

Multi-armed Bandit with Additional Observations

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose algorithms that are asymptotic-optimal and order-optimal in their regrets under the settings of stochastic and adversarial rewards, respectively.

Jinwoo Shin | Donggyu Yun | Yung Yi | Alexandre Proutiere | Sumyeong Ahn

[1] Alexandre Proutière,et al. Optimal Rate Sampling in 802.11 systems , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[2] R. Srikant,et al. Bandits with Budgets , 2015, SIGMETRICS.

[3] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[4] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[5] Kevin C. Almeroth,et al. Joint rate and channel width adaptation for 802.11 MIMO wireless networks , 2013, 2013 IEEE International Conference on Sensing, Communications and Networking (SECON).

[6] Filip Radlinski,et al. Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[7] Jon M. Kleinberg,et al. Incentivizing exploration , 2014, EC.

[8] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9] Ron Kohavi. Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years , 2015, KDD.

[10] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[11] Satyen Kale,et al. Multiarmed Bandits With Limited Expert Advice , 2013, COLT.

[12] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[13] Rémi Munos,et al. Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.

[14] Noga Alon,et al. From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16] Koby Crammer,et al. Prediction with Limited Advice and Multiarmed Bandits with Paid Observations , 2014, ICML.

[17] J. J. Garcia-Luna-Aceves,et al. A practical approach to rate adaptation for multi-antenna systems , 2011, 2011 19th IEEE International Conference on Network Protocols.

[18] John C. Bicket,et al. Bit-rate selection in wireless networks , 2005 .

[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[20] Tao Qin,et al. Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising , 2013, NIPS.

[21] Songwu Lu,et al. MIMO rate adaptation in 802.11n wireless networks , 2010, MobiCom.

[22] Atilla Eryilmaz,et al. Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.

[23] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[24] Yishay Mansour,et al. Bayesian Incentive-Compatible Bandit Exploration , 2018 .

[25] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[26] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[27] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[28] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[29] Deepak S. Turaga,et al. Budgeted Prediction with Expert Advice , 2015, AAAI.

[30] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[31] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).