A Family of Robust Stochastic Operators for Reinforcement Learning

We consider a new family of stochastic operators for reinforcement learning with the goal of alleviating negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing that our family of operators preserve optimality and increase the action gap in a stochastic sense. Our empirical results illustrate the strong benefits of our robust stochastic operators, significantly outperforming the classical Bellman operator and recently proposed operators.

[1]  R. Bass Convergence of probability measures , 2011 .

[2]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[3]  L. Baird Reinforcement Learning Through Gradient Descent , 1999 .

[4]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[5]  Marc G. Bellemare,et al.  Increasing the Action Gap: New Operators for Reinforcement Learning , 2015, AAAI.

[6]  Hilbert J. Kappen,et al.  Speedy Q-Learning , 2011, NIPS.

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  Amir Massoud Farahmand,et al.  Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.

[9]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[10]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[11]  Stochastic Orders , 2008 .

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[15]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[16]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[17]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[18]  Dimitri P. Bertsekas,et al.  Q-learning and enhanced policy iteration in discounted dynamic programming , 2010, 49th IEEE Conference on Decision and Control (CDC).

[19]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[20]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[21]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .