Accountable Off-Policy Evaluation With Kernel Bellman Statistics
暂无分享,去创建一个
Qiang Liu | Yihao Feng | Tongzheng Ren | Ziyang Tang | Qiang Liu | Yihao Feng | Ziyang Tang | Tongzheng Ren
[1] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[2] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[3] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[4] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[5] Qiang Liu,et al. Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation , 2019, ICLR.
[6] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[7] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[8] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[9] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[10] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2002 .
[11] Predrag Klasnja,et al. Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health , 2019, ArXiv.
[12] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[13] Shie Mannor,et al. Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..
[14] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[15] G. Imbens,et al. Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .
[16] Qiang Liu,et al. Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning , 2020, ICLR.
[17] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[18] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[19] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[20] Xin Xu,et al. Kernel-Based Least Squares Policy Iteration for Reinforcement Learning , 2007, IEEE Transactions on Neural Networks.
[21] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[22] John Langford,et al. Doubly Robust Policy Evaluation and Learning , 2011, ICML.
[23] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[24] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation , 2020, ICML.
[25] Martha White,et al. Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains , 2010, NIPS.
[26] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[27] Lihong Li,et al. Doubly Robust Off-policy Evaluation for Reinforcement Learning , 2015, ArXiv.
[28] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[29] Xin Xu,et al. Kernel Least-Squares Temporal Difference Learning , 2006 .
[30] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[31] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[32] Gavin Taylor,et al. Kernelized value function approximation for reinforcement learning , 2009, ICML '09.
[33] Stephen Boyd,et al. A Rewriting System for Convex Optimization Problems , 2017, ArXiv.
[34] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[35] Peter Stone,et al. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation , 2016, AAAI.
[36] T. Jung,et al. Kernelizing LSPE(λ) , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[37] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[38] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[39] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[40] Guy Lever,et al. Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.
[41] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[42] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[43] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[44] Jiawei Huang,et al. Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization , 2020, ArXiv.
[45] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[46] Fang Han,et al. An Exponential Inequality for U-Statistics Under Mixing Conditions , 2016, Journal of Theoretical Probability.
[47] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.