A Family of Robust Stochastic Operators for Reinforcement Learning
暂无分享,去创建一个
[1] R. Bass. Convergence of probability measures , 2011 .
[2] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[3] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Marc G. Bellemare,et al. Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.
[6] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[7] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[8] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.
[9] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[10] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[11] Stochastic Orders , 2008 .
[12] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[13] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[14] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[15] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[16] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[17] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[18] Dimitri P. Bertsekas,et al. Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).
[19] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[20] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[21] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .