An Empirical Evaluation of Thompson Sampling

Thompson sampling is one of oldest heuristic to address the exploration / exploitation trade-off, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heuristic is very easy to implement, we argue that it should be part of the standard baselines to compare against.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[3]  J. Sarkar One-Armed Bandit Problems with Covariates , 1991 .

[4]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[6]  Deepak Agarwal,et al.  Online Models for Content Optimization , 2008, NIPS.

[7]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[8]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[9]  Lihong Li,et al.  Learning from Logged Implicit Exploration Data , 2010, NIPS.

[10]  Rajiv Khanna,et al.  Estimating rates of rare events with multiple hierarchies through scalable log-linear models , 2010, KDD '10.

[11]  Ole-Christoffer Granmo,et al.  Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton , 2010, Int. J. Intell. Comput. Cybern..

[12]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[13]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[14]  Benedict C. May Simulation Studies in Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2011 .

[15]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[16]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[17]  David S. Leslie,et al.  Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..

[18]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .