暂无分享,去创建一个
Qiang Liu | Yihao Feng | Na Zhang | Ziyang Tang | Jian Peng | Qiang Liu | Yihao Feng | Jian Peng | Ziyang Tang | Na Zhang
[1] Qiang Liu,et al. Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning , 2020, ICLR.
[2] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[3] Qiang Liu,et al. A Kernel Loss for Solving the Bellman Equation , 2019, NeurIPS.
[4] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.
[5] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[6] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[7] Qiang Liu,et al. Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation , 2019, ICLR.
[8] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[9] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[10] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[11] Tim Hesterberg,et al. Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.
[12] Nan Jiang,et al. Doubly Robust Off-policy Evaluation for Reinforcement Learning , 2015, ArXiv.
[13] Gérard D. Cohen,et al. Covering radius - Survey and recent results , 1985, IEEE Trans. Inf. Theory.
[14] Qiang Liu,et al. Accountable Off-Policy Evaluation With Kernel Bellman Statistics , 2020, ICML.
[15] Louis Wehenkel,et al. Batch mode reinforcement learning based on the synthesis of artificial trajectories , 2013, Ann. Oper. Res..
[16] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[17] Yisong Yue,et al. Batch Policy Learning under Constraints , 2019, ICML.
[18] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[19] Martha White,et al. Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains , 2010, NIPS.
[20] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[21] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.
[22] Peter Stone,et al. Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation , 2016, AAAI.
[23] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[24] Joaquin Quiñonero Candela,et al. Counterfactual reasoning and learning systems: the example of computational advertising , 2013, J. Mach. Learn. Res..
[25] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[26] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes , 2019, ArXiv.
[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[28] Mengdi Wang,et al. Learning to Control in Metric Space with Optimal Regret , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[29] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning , 2019 .
[30] Zhao Song,et al. Efficient Model-free Reinforcement Learning in Metric Spaces , 2019, ArXiv.
[31] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[32] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[33] Bo Dai,et al. GenDICE: Generalized Offline Estimation of Stationary Values , 2020, ICLR.
[34] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[35] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..