Sublinear Optimal Policy Value Estimation in Contextual Bandits
暂无分享,去创建一个
[1] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.
[2] Kenneth Y. Goldberg,et al. Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[5] Alessandro Lazaric,et al. Best-Arm Identification in Linear Bandits , 2014, NIPS.
[6] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[7] Richard G. Baraniuk,et al. A Contextual Bandits Framework for Personalized Learning Action Selection , 2016, EDM.
[8] Oren Somekh,et al. Almost Optimal Exploration in Multi-Armed Bandits , 2013, ICML.
[9] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[10] Kristjan H. Greenewald,et al. Action Centered Contextual Bandits , 2017, NIPS.
[11] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[12] Stefan Wager,et al. Efficient Policy Learning , 2017, ArXiv.
[13] Nando de Freitas,et al. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning , 2014, AISTATS.
[14] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[15] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[16] Alessandro Lazaric,et al. Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.
[17] Emma Brunskill,et al. Off-Policy Policy Gradient with State Distribution Correction , 2019, UAI 2019.
[18] S. Chatterjee. An error bound in the Sudakov-Fernique inequality , 2005, math/0510424.
[19] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.
[20] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[21] Li Zhou,et al. Latent Contextual Bandits and their Application to Personalized Recommendations for New Users , 2016, IJCAI.
[22] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[23] Masashi Sugiyama,et al. Fully adaptive algorithm for pure exploration in linear bandits , 2017, 1710.05552.
[24] Zoran Popovic,et al. Where to Add Actions in Human-in-the-Loop Reinforcement Learning , 2017, AAAI.
[25] Emma Brunskill,et al. Value Driven Representation for Human-in-the-Loop Reinforcement Learning , 2019, UMAP.
[26] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[27] Gregory Valiant,et al. Estimating Learnability in the Sublinear Data Regime , 2018, NeurIPS.
[28] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[29] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.
[30] A . Proof of Proposition 1 Proof , 2020 .