Conditional Importance Sampling for Off-Policy Learning
暂无分享,去创建一个
Tom Schaul | Hado van Hasselt | Diana Borsa | Anna Harutyunyan | Will Dabney | Mark Rowland | R'emi Munos | T. Schaul | R. Munos | Mark Rowland | Will Dabney | H. V. Hasselt | A. Harutyunyan | Diana Borsa
[1] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[2] Yifei Ma,et al. Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling , 2019, NeurIPS.
[3] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.
[4] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[5] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[6] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[7] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.
[8] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[9] J. Norris. Appendix: probability and measure , 1997 .
[10] Nicolas Le Roux,et al. A Geometric Perspective on Optimal Representations for Reinforcement Learning , 2019, NeurIPS.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[13] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[14] Yao Liu,et al. Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling , 2020, ICML.
[15] Yifei Ma,et al. Marginalized Off-Policy Evaluation for Reinforcement Learning , 2019, NeurIPS 2019.
[16] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[17] Masatoshi Uehara,et al. Minimax Weight and Q-Function Learning for Off-Policy Evaluation , 2019, ICML.
[18] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[19] Dimitri P. Bertsekas,et al. Stochastic optimal control : the discrete time case , 2007 .
[20] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[21] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[22] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[23] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[24] Peter Stone,et al. Importance Sampling Policy Evaluation with an Estimated Behavior Policy , 2018, ICML.
[25] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[26] Simo Särkkä,et al. Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.
[27] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[28] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[29] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[30] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[31] Simo Srkk,et al. Bayesian Filtering and Smoothing , 2013 .
[32] Richard S. Sutton,et al. Off-policy TD( l) with a true online equivalence , 2014, UAI.
[33] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[34] Masatoshi Uehara,et al. Efficiently Breaking the Curse of Horizon: Double Reinforcement Learning in Infinite-Horizon Processes , 2019, ArXiv.
[35] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[36] Masatoshi Uehara,et al. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes , 2019, J. Mach. Learn. Res..
[37] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[38] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[39] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.
[40] Lihong Li,et al. Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.
[41] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[42] Marcello Restelli,et al. Optimistic Policy Optimization via Multiple Importance Sampling , 2019, ICML.
[43] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[44] A. Rollett,et al. The Monte Carlo Method , 2004 .
[45] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[46] Shie Mannor,et al. Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis , 2015, AAAI.
[47] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[48] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[49] Bo Dai,et al. DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections , 2019, NeurIPS.