Selective neural network ensembles in reinforcement learning: Taking the advantage of many agents

Abstract Ensemble models can achieve more accurate and robust predictions than single learners. A selective ensemble may further improve the predictions by selecting a subset of the models from the entire ensemble, based on a quality criterion. We consider reinforcement learning ensembles, where the members are artificial neural networks. In this context, we extensively evaluate a recently introduced algorithm for ensemble subset selection in reinforcement learning scenarios. The aim of the learning strategy is to select members whose weak decisions are compensated by strong decisions for collected states. The correctness of a decision is determined by the Bellman error. In our empirical evaluations, we compare the benchmark performances of the full ensembles and the selective ensembles in generalized maze and in SZ-Tetris. Both are large state environments. We found that while the selective ensembles have a small number of agents, they significantly outperform the large ensembles. We therefore conclude that selecting an informative subset of many agents may be more efficient than training single agents or full ensembles.

[1]  Heidi Burgiel,et al.  How to lose at Tetris , 1997, The Mathematical Gazette.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Friedhelm Schwenker,et al.  Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.

[4]  Steffen Udluft,et al.  Ensembles of Neural Networks for Robust Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Zhi-Hua Zhou,et al.  Selective Ensemble under Regularization Framework , 2009, MCS.

[7]  Durga L. Shrestha,et al.  Experiments with AdaBoost.RT, an Improved Boosting Scheme for Regression , 2006, Neural Computation.

[8]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[9]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[10]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[11]  Hanif D. Sherali,et al.  An improved linearization strategy for zero-one quadratic programming problems , 2006, Optim. Lett..

[12]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[13]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[14]  Friedhelm Schwenker,et al.  Selective Neural Network Ensembles in Reinforcement Learning , 2014, ESANN.

[15]  Zhi-Hua Zhou,et al.  Exploiting unlabeled data to enhance ensemble diversity , 2009, 2010 IEEE International Conference on Data Mining.

[16]  Friedhelm Schwenker,et al.  Combining Committee-Based Semi-Supervised Learning and Active Learning , 2010, Journal of Computer Science and Technology.

[17]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Zhi-Hua Zhou,et al.  Unlabeled Data and Multiple Views , 2011, PSL.

[19]  Friedhelm Schwenker,et al.  Ensemble Methods for Reinforcement Learning with Function Approximation , 2011, MCS.

[20]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..