Perturbed-History Exploration in Stochastic Multi-Armed Bandits
暂无分享,去创建一个
Craig Boutilier | Csaba Szepesvári | Mohammad Ghavamzadeh | Branislav Kveton | Csaba Szepesvari | B. Kveton | M. Ghavamzadeh | Craig Boutilier
[1] K. Pearson. Biometrika , 1902, The American Naturalist.
[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[3] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[4] Thomas G. Dietterich,et al. In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.
[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[6] Aravaipa Canyon Basin. Volume 3 , 2012, Journal of Diabetes Investigation.
[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[8] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).
[9] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.
[10] Zoubin Ghahramani,et al. Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.
[11] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[12] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.
[13] Elsevier Sdol,et al. Advances in Applied Mathematics , 2009 .
[14] Olivier Chapelle,et al. A dynamic bayesian network click model for web search ranking , 2009, WWW '09.
[15] Wolfgang Nejdl,et al. Proceedings of the 18th international conference on World wide web , 2009, WWW 2009.
[16] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[17] A. Sayed,et al. Foundations and Trends ® in Machine Learning > Vol 7 > Issue 4-5 Ordering Info About Us Alerts Contact Help Log in Adaptation , Learning , and Optimization over Networks , 2011 .
[18] Peter A. Flach,et al. Proceedings of the 28th International Conference on Machine Learning , 2011 .
[19] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[20] Vittorio Ferrari,et al. Advances in Neural Information Processing Systems 24 , 2011 .
[21] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[22] Gergely Neu,et al. An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.
[23] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[24] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[25] Shie Mannor,et al. Thompson Sampling for Complex Online Problems , 2013, ICML.
[26] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.
[27] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[28] Peter A. Flach,et al. Advances in Neural Information Processing Systems 28 , 2015 .
[29] Federica Mandreoli,et al. Journal of Computer and System Sciences Special Issue on Query Answering on Graph-Structured Data , 2016, Journal of computer and system sciences (Print).
[30] Kilian Q. Weinberger,et al. Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 , 2016 .
[31] Zheng Wen,et al. DCM Bandits: Learning to Rank with Multiple Clicks , 2016, ICML.
[32] Zhi-Hua Zhou,et al. Online Stochastic Linear Optimization under One-bit Feedback , 2015, ICML.
[33] Robert D. Nowak,et al. Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.
[34] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.
[35] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[36] Steve Hanneke,et al. Proceedings of the 28th International Conference on Algorithmic Learning Theory , 2017 .
[37] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[38] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[39] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[40] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[41] Ole J. Mengshoel,et al. Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models , 2017, AAAI.
[42] Tor Lattimore,et al. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.
[43] Craig Boutilier,et al. Perturbed-History Exploration in Stochastic Linear Bandits , 2019, UAI.
[44] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[45] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .