Conservative Contextual Linear Bandits
暂无分享,去创建一个
Benjamin Van Roy | Mohammad Ghavamzadeh | Abbas Kazerouni | Yasin Abbasi | M. Ghavamzadeh | Abbas Kazerouni | Y. Abbasi
[1] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[2] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[3] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[4] Thorsten Joachims,et al. Batch learning from logged bandit feedback through counterfactual risk minimization , 2015, J. Mach. Learn. Res..
[5] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[6] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[7] Marek Petrik,et al. Safe Policy Improvement by Minimizing Robust Baseline Regret , 2016, NIPS.
[8] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.
[9] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[10] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[11] Yifan Wu,et al. Conservative Bandits , 2016, ICML.
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[14] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.