Partitioning in reinforcement learning

This paper addresses automatic partitioning in complex reinforcement learning tasks with multiple agents, without a priori domain knowledge regarding task structures. Partitioning a state/input space into multiple regions helps to exploit differential characteristics of regions and differential characteristics of agents, thus facilitating learning and reducing the complexity of agents especially when function approximators are used. We develop a method for optimizing the partitioning of the space through experience without the use of a priori domain knowledge. The method is experimentally tested and compared to a number of other algorithms. As expected, we found that the multi-agent method with automatic partitioning outperformed single-agent learning.

[1]  Chen K. Tham,et al.  Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..

[2]  Terence D. Sanger,et al.  A tree-structured adaptive network for function approximation in high-dimensional spaces , 1991, IEEE Trans. Neural Networks.

[3]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[4]  Ron Sun,et al.  A hybrid model for learning sequential navigation , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[5]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[6]  B. Achiriloaie,et al.  VI REFERENCES , 1961 .

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  R. Sun,et al.  Bottom-up Skill Learning in Reactive Sequential Decision Tasks , 1996 .

[9]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[10]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[13]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[14]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[15]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[16]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17]  Volker Tresp,et al.  Combining Estimators Using Non-Constant Weighting Functions , 1994, NIPS.

[18]  Mark Humphrys,et al.  W-learning: A simple RL-based Society of Mind , 1995 .

[19]  Andrew McCallum,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[20]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[21]  Enrico Blanzieri,et al.  Learning Radial Basis Function Networks On-line , 1996, International Conference on Machine Learning.

[22]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[23]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[24]  Satinder Singh,et al.  Learning to Solve Markovian Decision Processes , 1993 .