暂无分享,去创建一个
Longbing Cao | Gang Pan | Shijian Li | Longxiang Shi | Long Yang | Longbing Cao | Shijian Li | Gang Pan | Long Yang | Longxiang Shi
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[4] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[5] Pengfei Li,et al. Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[6] Gang Pan,et al. A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning , 2018, IJCAI.
[7] Hado Philip van Hasselt,et al. Insights in reinforcement rearning : formal analysis and empirical evaluation of temporal-difference learning algorithms , 2011 .
[8] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[11] Lakhmi C. Jain,et al. Experimental analysis on Sarsa(lambda) and Q(lambda) with different eligibility traces strategies , 2009, J. Intell. Fuzzy Syst..
[12] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[13] D. Barrios-Aranibar,et al. LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS , 2007 .
[14] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[15] Richard S. Sutton,et al. Multi-step Reinforcement Learning: A Unifying Algorithm , 2017, AAAI.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[18] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[19] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.