Optimistic Bayesian Sampling in Contextual-Bandit Problems
暂无分享,去创建一个
David S. Leslie | Anthony Lee | Nathan Korda | Benedict C. May | N. Korda | Anthony Lee | D. Leslie | A. Lee | Benedict C. May | Nathan Korda | Anthony Lee | David S. Leslie | Benedict C. May | Anthony Lee | David S. Leslie
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] F. Eicker. Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions , 1963 .
[3] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[4] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[5] W. Beyer. CRC Standard Probability And Statistics Tables and Formulae , 1990 .
[6] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[7] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] Tze Leung Lai,et al. Incomplete learning from endogenous data in dynamic allocation , 1999 .
[10] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[11] Yuhong Yang,et al. RANDOMIZED ALLOCATION WITH NONPARAMETRIC ESTIMATION FOR A MULTI-ARMED BANDIT PROBLEM WITH COVARIATES , 2002 .
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[14] Paul Bourgine,et al. Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty , 1999, Machine Learning.
[15] Leslie Pack Kaelbling,et al. Associative Reinforcement Learning: Functions in k-DNF , 1994, Machine Learning.
[16] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.
[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[18] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[19] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[20] Ole-Christoffer Granmo,et al. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , 2008, 2008 Seventh International Conference on Machine Learning and Applications.
[21] Dimitris K. Tasoulis,et al. Simulation Studies of Multi-armed Bandits with Covariates (Invited Paper) , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).
[22] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[23] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[24] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[25] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[26] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[27] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[28] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[29] Benedict C. May. Simulation Studies in Optimistic Bayesian Sampling in Contextual-Bandit Problems , 2011 .
[30] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[31] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[32] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[33] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[34] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[35] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[36] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .