Importance Resampling for Off-policy Prediction
暂无分享,去创建一个
Martha White | Daniel Graves | Matthew Schlegel | Wesley Chung | Jian Qian | Martha White | M. Schlegel | Daniel Graves | Wesley Chung | Jian Qian
[1] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[2] Richard S. Sutton,et al. Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.
[3] Doina Precup,et al. Learning Options in Reinforcement Learning , 2002, SARA.
[4] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[5] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[6] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[7] J. Hammersley. SIMULATION AND THE MONTE CARLO METHOD , 1982 .
[8] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[9] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[10] H. Kahn,et al. Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..
[11] Richard S. Sutton,et al. Weighted importance sampling for off-policy learning with linear function approximation , 2014, NIPS.
[12] Tom Schaul,et al. Better Generalization with Forecasts , 2013, IJCAI.
[13] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[14] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[15] V. Climenhaga. Markov chains and mixing times , 2013 .
[16] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[17] Sigrún Andradóttir,et al. On the Choice of Alternative Measures in Importance Sampling with Markov Chains , 1995, Oper. Res..
[18] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[19] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[20] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[21] Neil J. Gordon,et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..
[22] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[23] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[24] Jun S. Liu,et al. Sequential Imputations and Bayesian Missing Data Problems , 1994 .
[25] Alan E. Gelfand,et al. Bayesian statistics without tears: A sampling-resampling perspective , 1992 .
[26] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[27] Richard S. Sutton,et al. Off-policy learning based on weighted importance sampling with linear computational complexity , 2015, UAI.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Francisco Herrera,et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..
[30] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[31] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[32] Luca Martino,et al. Effective sample size for importance sampling based on discrepancy measures , 2016, Signal Process..
[33] D. Rubin. Using the SIR algorithm to simulate posterior distributions , 1988 .
[34] N. Gordon,et al. Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .
[35] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[36] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[37] Philip S. Thomas,et al. Importance Sampling with Unequal Support , 2016, AAAI.
[38] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[39] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[40] Ø. Skare,et al. Improved Sampling‐Importance Resampling and Reduced Bias Importance Sampling , 2003 .