暂无分享,去创建一个
Emma Brunskill | Yao Liu | Adith Swaminathan | Alekh Agarwal | Yao Liu | Alekh Agarwal | Adith Swaminathan | E. Brunskill
[1] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[2] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[3] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[4] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[6] A. Preliminaries. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016 .
[7] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[8] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[9] Shimon Whiteson,et al. Generalized Off-Policy Actor-Critic , 2019, NeurIPS.
[10] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[11] Mehrdad Farajtabar,et al. More Robust Doubly Robust Off-policy Evaluation , 2018, ICML.
[12] R. Sutton,et al. Gradient temporal-difference learning algorithms , 2011 .
[13] V. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .
[14] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.
[15] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[16] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[17] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[18] Miroslav Dudík,et al. Optimal and Adaptive Off-policy Evaluation in Contextual Bandits , 2016, ICML.
[19] Martha White,et al. Linear Off-Policy Actor-Critic , 2012, ICML.
[20] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[22] Henry Zhu,et al. Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.
[23] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[24] Martha White,et al. An Off-policy Policy Gradient Theorem Using Emphatic Weightings , 2018, NeurIPS.
[25] Yao Liu,et al. Representation Balancing MDPs for Off-Policy Policy Evaluation , 2018, NeurIPS.
[26] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[27] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.
[28] Marc G. Bellemare,et al. Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift , 2019, AAAI.
[29] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[30] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[31] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[32] Philip S. Thomas,et al. High Confidence Policy Improvement , 2015, ICML.
[33] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.