An Ensemble of Linearly Combined Reinforcement-Learning Agents

Reinforcement-learning (RL) algorithms are often tweaked and tuned to specific environments when applied, calling into question whether learning can truly be considered autonomous in these cases. In this work, we show how more robust learning across environments is possible by adopting an ensemble approach to reinforcement learning. Our approach learns a weighted linear combination of Q-values from multiple independent learning algorithms. In our evaluations in generalized RL environments, we find that the algorithm compares favorably to the best tuned algorithm. Our work provides a promising basis for further study into the use of ensemble methods in RL.

[1]  Kevin Leyton-Brown,et al.  Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.

[2]  Shimon Whiteson,et al.  Generalized Domains for Empirical Evaluations in Reinforcement Learning , 2009 .

[3]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[4]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[5]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[7]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[8]  Michael L. Littman,et al.  A probabilistic approach to solving crossword puzzles , 2002, Artif. Intell..

[9]  Yehuda Koren,et al.  All Together Now: A Perspective on the Netflix Prize , 2010 .

[10]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[11]  Shimon Whiteson,et al.  The Reinforcement Learning Competitions , 2010 .

[12]  Justin A. Boyan,et al.  Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.

[13]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.