Event-learning and robust policy heuristics

In this paper we introduce a novel reinforcement learning algorithm called event-learning. The algorithm uses events, ordered pairs of two consecutive states. We define event-value function and we derive learning rules. Combining our method with a well-known robust control method, the SDS algorithm, we introduce Robust Policy Heuristics (RPH). It is shown that RPH, a fast-adapting non-Markovian policy, is particularly useful for coarse models of the environment and could be useful for some partially observed systems. RPH may be of help in alleviating the 'curse of dimensionality' problem. Event-learning and RPH can be used to separate time scales of learning of value functions and adaptation. We argue that the definition of modules is straightforward for event-learning and event-learning makes planning feasible in the RL framework. Computer simulations of a rotational inverted pendulum with coarse discretization are shown to demonstrate the principle.

[1]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[2]  Csaba Szepesvri,et al.  An integrated architecture for motion‐control and path‐planning , 1998 .

[3]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[4]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[5]  Alberto Isidori,et al.  Nonlinear control systems: an introduction (2nd ed.) , 1989 .

[6]  András Lörincz,et al.  Approximate geometry representations and sensory fusion , 1996, Neurocomputing.

[7]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[8]  G. Lei A neuron model with fluid properties for solving labyrinthian puzzle , 1990, Biological Cybernetics.

[9]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[10]  Narendra Ahuja,et al.  Gross motion planning—a survey , 1992, CSUR.

[11]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[12]  Kenji Doya,et al.  Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.

[13]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[14]  Csaba Szepesv Ari,et al.  Ockham's razor modeling of the matrisome channels of the basal ganglia thalamocortical loops. , 2001 .

[15]  S.H.G. ten Hagen Continuous State Space Q-Learning for control of Nonlinear Systems , 2001 .

[16]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[17]  Andrew G. Barto,et al.  DISCRETE AND CONTINUOUS MODELS , 1978 .

[18]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[19]  Roderic A. Grupen,et al.  The applications of harmonic functions to robotics , 1993, J. Field Robotics.

[20]  András Lörincz,et al.  Neurocontroller using dynamic state feedback for compensatory control , 1997, Neural Networks.

[21]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[22]  András Lörincz,et al.  Self-Organizing Multi-Resolution Grid for Motion Planning and Control , 1996, Int. J. Neural Syst..

[23]  A. Isidori Nonlinear Control Systems: An Introduction , 1986 .

[24]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[25]  Stan C. A. M. Gielen,et al.  Neural Network Dynamics for Path Planning and Obstacle Avoidance , 1995, Neural Networks.

[26]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  S.H.G. ten Hagen,et al.  Linear Quadratic Regulation using reinforcement learning , 1998 .

[29]  Piero Mussio,et al.  Toward a Practice of Autonomous Systems , 1994 .