Aggregation of Reinforcement Learning Algorithms

Reinforcement learning (RL) is a machine learning method that can learn an optimal strategy for a system without knowing the mathematical model of the system. Many RL algorithms are successfully applied in various fields. However, each algorithm has its advantages and disadvantages. With the increasing complexity of environments and tasks, it is difficult for a single learning algorithm to cope with complicated learning problems with high performance. This motivated us to combine some learning algorithms to improve the learning quality. This paper proposes a new multiple learning architecture, "aggregated multiple reinforcement learning system (AMRLS)". AMRLS adopts three different learning algorithms to learn individually and then combines their results with aggregation methods. To evaluate its performance, AMRLS is tested on two different environments: a cart-pole system and a maze environment. The presented simulation results reveal that aggregation not only provides robustness and fault tolerance ability, but also produces more smooth learning curves and needs fewer learning steps than individual learning algorithms.

[1]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[2]  Mohamed S. Kamel,et al.  Reinforcement learning and aggregation , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[3]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[4]  D. Ernst,et al.  Combining a stability and a performance-oriented control in power systems , 2005, IEEE Transactions on Power Systems.

[5]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Ching Y. Suen,et al.  A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Mohamed S. Kamel,et al.  Data Dependence in Combining Classifiers , 2003, Multiple Classifier Systems.

[10]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  James C. Bezdek,et al.  Decision templates for multiple classifier fusion: an experimental comparison , 2001, Pattern Recognit..

[13]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[14]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[15]  Antanas Verikas,et al.  Soft combination of neural classifiers: A comparative study , 1999, Pattern Recognit. Lett..

[16]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[17]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[18]  Andrew Y. Ng,et al.  Shaping and policy search in reinforcement learning , 2003 .

[19]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[20]  Paul D. Gader,et al.  Fusion of handwritten word classifiers , 1996, Pattern Recognit. Lett..

[21]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..