Approximate modified policy iteration and its application to the game of Tetris
暂无分享,去创建一个
Matthieu Geist | Bruno Scherrer | Mohammad Ghavamzadeh | Boris Lesner | Victor Gabillon | M. Ghavamzadeh | Victor Gabillon | B. Scherrer | M. Geist | Boris Lesner
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[5] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[6] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.
[7] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[8] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[9] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[10] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[11] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[12] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[13] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[14] Robert Givan,et al. Approximate Policy Iteration with a Policy Language Bias , 2003, NIPS.
[15] Erik D. Demaine,et al. Tetris is Hard, Even to Approximate , 2003, COCOON.
[16] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[17] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[18] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[19] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.
[20] Erik D. Demaine,et al. Tetris is hard, even to approximate , 2002, Int. J. Comput. Geom. Appl..
[21] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .
[22] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[23] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[24] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[25] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[26] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .
[27] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[28] Rémi Munos,et al. Performance Bounds in Lp-norm for Approximate Value Iteration , 2007, SIAM J. Control. Optim..
[29] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[30] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[31] Bruno Scherrer,et al. Building Controllers for Tetris , 2009, J. Int. Comput. Games Assoc..
[32] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[33] Alessandro Lazaric,et al. Analysis of a Classification-based Policy Iteration Algorithm , 2010, ICML.
[34] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[35] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[36] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[37] U. Rieder,et al. Markov Decision Processes , 2010 .
[38] Csaba Szepesvári,et al. Error Propagation for Approximate Policy and Value Iteration , 2010, NIPS.
[39] Csaba Szepesvári,et al. Reinforcement Learning Algorithms for MDPs , 2011 .
[40] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[41] Bruno Scherrer,et al. Classification-based Policy Iteration with a Critic , 2011, ICML.
[42] Matthieu Geist,et al. Approximate Modified Policy Iteration , 2012, ICML.
[43] D. Barber,et al. A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes , 2012, NIPS.
[44] Matthieu Geist,et al. Approximate Modied Policy Iteration , 2012 .
[45] Alessandro Lazaric,et al. Finite-sample analysis of least-squares policy iteration , 2012, J. Mach. Learn. Res..
[46] Bruno Scherrer,et al. Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies , 2013, ArXiv.
[47] ScherrerBruno. Performance bounds for λ policy iteration and application to the game of Tetris , 2013 .
[48] Bruno Scherrer,et al. Performance bounds for λ policy iteration and application to the game of Tetris , 2013, J. Mach. Learn. Res..
[49] Uriel G. Rothblum,et al. (Approximate) iterated successive approximations algorithm for sequential decision processes , 2013, Ann. Oper. Res..
[50] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[51] Matthieu Geist,et al. Off-policy learning with eligibility traces: a survey , 2013, J. Mach. Learn. Res..