Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
暂无分享,去创建一个
Bruno Scherrer | Mohammad Ghavamzadeh | Victor Gabillon | M. Ghavamzadeh | Victor Gabillon | B. Scherrer
[1] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[2] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[3] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[4] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.
[5] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[6] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[7] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[8] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[9] Erik D. Demaine,et al. Tetris is Hard, Even to Approximate , 2003, COCOON.
[10] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[11] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .
[12] Erik D. Demaine,et al. Tetris is hard, even to approximate , 2002, Int. J. Comput. Geom. Appl..
[13] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[14] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[15] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .
[16] Bruno Scherrer,et al. Building Controllers for Tetris , 2009, J. Int. Comput. Games Assoc..
[17] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[18] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[19] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[20] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[21] D. Barber,et al. A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes , 2012, NIPS.
[22] Matthieu Geist,et al. Approximate Modied Policy Iteration , 2012 .
[23] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..