论文信息 - Learning Tetris Using the Noisy Cross-Entropy Method

Learning Tetris Using the Noisy Cross-Entropy Method

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.

András Lörincz | István Szita | I. Szita | A. Lörincz | András Lörincz

[1] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[3] Michail G. Lagoudakis,et al. Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.

[4] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[5] Jan Ramon,et al. On the numeric stability of Gaussian processes regression for relational reinforcement learning , 2004, ICML 2004.

[6] Gabriella Kókai,et al. Evolving a Heuristic Function for the Game of Tetris , 2004, LWA.

[7] Erik D. Demaine,et al. Tetris is hard, even to approximate , 2002, Int. J. Comput. Geom. Appl..

[8] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..

[9] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[10] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .

[11] Sean P. Meyn,et al. Probabilistic and Randomized Methods for Design under Uncertainty , 2006 .