论文信息 - A Practical Semi-Parametric Contextual Bandit - 字舞流文

A Practical Semi-Parametric Contextual Bandit

Classic multi-armed bandit algorithms are inefficient for a large number of arms. On the other hand, contextual bandit algorithms are more efficient, but they suffer from a large regret due to the bias of reward estimation with finite dimensional features. Although recent studies proposed semi-parametric bandits to overcome these defects, they assume arms’ features are constant over time. However, this assumption rarely holds in practice, since real-world problems often involve underlying processes that are dynamically evolving over time especially for the special promotions like Singles’ Day sales. In this paper, we formulate a novel Semi-Parametric Contextual Bandit Problem to relax this assumption. For this problem, a novel TwoSteps Upper-Confidence Bound framework, called Semi-Parametric UCB (SPUCB), is presented. It can be flexibly applied to linear parametric function problem with a satisfied gap-free bound on the n-step regret. Moreover, to make our method more practical in online system, an optimization is proposed for dealing with high dimensional features of a linear function. Extensive experiments on synthetic data as well as a real dataset from one of the largest e-commercial platforms demonstrate the superior performance of our algorithm.

Rong Jin | Xuying Meng | Jiahao Liu | Cheng Yang | Nan Li | Tao Yao | Miao Xie | Yi Peng | Rong Jin | Jiahao Liu | Tao Yao | Miao Xie | Xuying Meng | Cheng Yang | Nan Li | Yi Peng

[1] Akshay Krishnamurthy,et al. Semiparametric Contextual Bandits , 2018, ICML.

[2] Zheng Wen,et al. Combinatorial Cascading Bandits , 2015, NIPS.

[3] K. Pearson,et al. Biometrika , 1902, The American Naturalist.

[4] Gerard de Melo,et al. DynaDiffuse: A Dynamic Diffusion Model for Continuous Time Constrained Influence Maximization , 2015, AAAI.

[5] Xiaoyan Zhu,et al. Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.

[6] Juliana Freire,et al. Proceedings of the 19th international conference on World wide web , 2010, WWW 2010.

[7] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[8] Kristjan H. Greenewald,et al. Action Centered Contextual Bandits , 2017, NIPS.

[9] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[10] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[11] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[12] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[13] D. Teneketzis,et al. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[14] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[15] Peter S. Fader,et al. Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016, Mark. Sci..

[16] H. E. Kuhn,et al. BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, , 2007 .

[17] H. Mills. Marketing as a Science , 1961 .

[18] Kilian Q. Weinberger,et al. Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .

[19] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[20] Shuai Li,et al. Contextual Combinatorial Cascading Bandits , 2016, ICML.

[21] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.

[22] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[23] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[24] O. Bagasra,et al. Proceedings of the National Academy of Sciences , 1914, Science.

[25] William H Press,et al. Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research , 2009, Proceedings of the National Academy of Sciences.

[26] Wei Chen,et al. Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.