暂无分享,去创建一个
[1] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[2] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[3] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[4] Jay Bartroff,et al. Sequential Experimentation in Clinical Trials , 2013 .
[5] David S. Leslie,et al. Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2012, J. Mach. Learn. Res..
[6] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.
[7] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[8] Sébastien Bubeck,et al. Prior-free and prior-dependent regret bounds for Thompson Sampling , 2013, 2014 48th Annual Conference on Information Sciences and Systems (CISS).
[9] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[10] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[11] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[12] Sudipto Guha,et al. Stochastic Regret Minimization via Thompson Sampling , 2014, COLT.
[13] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[14] Jay Bartroff,et al. Sequential Experimentation in Clinical Trials: Design and Analysis , 2012 .
[15] Lihong Li,et al. Generalized Thompson Sampling for Contextual Bandits , 2013, ArXiv.
[16] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[17] Hiroshi Nakagawa,et al. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.
[18] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[19] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[20] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[21] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[22] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[23] Sudipto Guha,et al. Approximation Algorithms for Bayesian Multi-Armed Bandit Problems , 2013, ArXiv.
[24] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[25] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[26] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[27] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[28] Tor Lattimore,et al. The Pareto Regret Frontier for Bandits , 2015, NIPS.
[29] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[30] Yuval Peres,et al. Towards Optimal Algorithms for Prediction with Expert Advice , 2014, SODA.
[31] Akimichi Takemura,et al. Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors , 2013, AISTATS.
[32] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.