暂无分享,去创建一个
[1] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[2] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[4] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[5] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[6] Ambuj Tewari,et al. On the Optimality of Perturbations in Stochastic and Adversarial Multi-armed Bandit Problems , 2019, NeurIPS.
[7] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[8] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[9] Zheng Wen,et al. New Insights into Bootstrapping for Bandits , 2018, ArXiv.
[10] Claudio Gentile,et al. Boltzmann Exploration Done Right , 2017, NIPS.
[11] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[12] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[13] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[14] Dean Eckles,et al. Thompson sampling with the online bootstrap , 2014, ArXiv.
[15] Chih-Wei Hsu,et al. Empirical Bayes Regret Minimization , 2019, ArXiv.
[16] Wei Chu,et al. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models , 2011 .
[17] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[18] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.
[19] Robert D. Nowak,et al. Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.
[20] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[21] Tor Lattimore,et al. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.
[22] Zhi-Hua Zhou,et al. Online Stochastic Linear Optimization under One-bit Feedback , 2015, ICML.
[23] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .
[24] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[25] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[26] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[27] Michèle Sebag,et al. Exploration vs Exploitation vs Safety: Risk-Aware Multi-Armed Bandits , 2013, ACML.
[28] Craig Boutilier,et al. Randomized Exploration in Generalized Linear Bandits , 2019, AISTATS.
[29] Craig Boutilier,et al. Perturbed-History Exploration in Stochastic Linear Bandits , 2019, UAI.
[30] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[31] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.
[32] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[33] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[34] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[35] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[36] Shie Mannor,et al. Sub-sampling for Multi-armed Bandits , 2014, ECML/PKDD.
[37] Liang Tang,et al. Personalized Recommendation via Parameter-Free Contextual Bandits , 2015, SIGIR.
[38] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[39] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.