A Practical Semi-Parametric Contextual Bandit

Classic multi-armed bandit algorithms are inefficient for a large number of arms. On the other hand, contextual bandit algorithms are more efficient, but they suffer from a large regret due to the bias of reward estimation with finite dimensional features. Although recent studies proposed semi-parametric bandits to overcome these defects, they assume arms’ features are constant over time. However, this assumption rarely holds in practice, since real-world problems often involve underlying processes that are dynamically evolving over time especially for the special promotions like Singles’ Day sales. In this paper, we formulate a novel Semi-Parametric Contextual Bandit Problem to relax this assumption. For this problem, a novel TwoSteps Upper-Confidence Bound framework, called Semi-Parametric UCB (SPUCB), is presented. It can be flexibly applied to linear parametric function problem with a satisfied gap-free bound on the n-step regret. Moreover, to make our method more practical in online system, an optimization is proposed for dealing with high dimensional features of a linear function. Extensive experiments on synthetic data as well as a real dataset from one of the largest e-commercial platforms demonstrate the superior performance of our algorithm.

[1]  Akshay Krishnamurthy,et al.  Semiparametric Contextual Bandits , 2018, ICML.

[2]  Zheng Wen,et al.  Combinatorial Cascading Bandits , 2015, NIPS.

[3]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.

[4]  Gerard de Melo,et al.  DynaDiffuse: A Dynamic Diffusion Model for Continuous Time Constrained Influence Maximization , 2015, AAAI.

[5]  Xiaoyan Zhu,et al.  Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.

[6]  Juliana Freire,et al.  Proceedings of the 19th international conference on World wide web , 2010, WWW 2010.

[7]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[8]  Kristjan H. Greenewald,et al.  Action Centered Contextual Bandits , 2017, NIPS.

[9]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[10]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[11]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[14]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[15]  Peter S. Fader,et al.  Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016, Mark. Sci..

[16]  H. E. Kuhn,et al.  BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, , 2007 .

[17]  H. Mills Marketing as a Science , 1961 .

[18]  Kilian Q. Weinberger,et al.  Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .

[19]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[20]  Shuai Li,et al.  Contextual Combinatorial Cascading Bandits , 2016, ICML.

[21]  Shie Mannor,et al.  Thompson Sampling for Complex Online Problems , 2013, ICML.

[22]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[23]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[24]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[25]  William H Press,et al.  Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research , 2009, Proceedings of the National Academy of Sciences.

[26]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.