The Effects of Large Disturbances on On-Line Reinforcement Learning for aWalking Robot

Reinforcement Learning is a promising paradigm for adding learning capabilities to humanoid robots. One of the difficulties of the real world is the presence of disturbances. In Reinforcement Learning, disturbances are typically dealt with stochastically. However, large and infrequent disturbances do not fit well in this framework; essentially, they are outliers and not part of the underlying (stochastic) Markov Decision Process. Therefore, they can negatively influence learning. The main reasons for such disturbances for a humanoid robot are sudden changes in the dynamics (such as a sudden push), sensor noise and sampling time irregularities. We investigate the effects of these types of outliers on the on-line learning process of a simple walking robot simulation. We propose to exclude the outliers from the learning process with the aim to improve convergence and the final solution. While infrequent sensor and timing outliers had a negligible influence, infrequent pushes heavily disrupted the learning process. By excluding the outliers from the learning process, performance was again restored.

[1]  Douglas C. Hittle,et al.  Robust reinforcement learning control with static and dynamic stability , 2001 .

[2]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[3]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[4]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[5]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[6]  Martijn Wisse,et al.  The design of LEO: A 2D bipedal walking robot for online autonomous Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  M. Coleman,et al.  The simplest walking model: stability, complexity, and scaling. , 1998, Journal of biomechanical engineering.

[8]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[9]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Douglas C. Hittle,et al.  Robust Reinforcement Learning Control Using Integral Quadratic Constraints for Recurrent Neural Networks , 2007, IEEE Transactions on Neural Networks.

[12]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.