Double Thompson Sampling for Dueling Bandits
暂无分享,去创建一个
[1] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[2] Thorsten Joachims,et al. The K-armed Dueling Bandits Problem , 2012, COLT.
[3] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[4] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .
[5] Robert D. Nowak,et al. Sparse Dueling Bandits , 2015, AISTATS.
[6] Hiroshi Nakagawa,et al. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.
[7] Thorsten Joachims,et al. Beat the Mean Bandit , 2011, ICML.
[8] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.
[9] Katja Hofmann,et al. Contextual Dueling Bandits , 2015, COLT.
[10] Hiroshi Nakagawa,et al. Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm , 2016, ICML.
[11] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[12] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[13] Thorsten Joachims,et al. Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.
[14] Raphaël Féraud,et al. Generic Exploration and K-armed Voting Bandits , 2013, ICML.
[15] M. de Rijke,et al. Relative confidence sampling for efficient on-line ranker evaluation , 2014, WSDM.
[16] Thorsten Joachims,et al. Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.
[17] M. de Rijke,et al. Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem , 2013, ICML.
[18] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[19] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[20] Alessandro Panconesi,et al. Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .
[21] Hiroshi Nakagawa,et al. Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem , 2015, COLT.
[22] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[23] M. de Rijke,et al. Copeland Dueling Bandits , 2015, NIPS.
[24] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[25] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[26] R. Srikant,et al. Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.