Itération sur les Politiques Optimiste et Apprentissage du Jeu de Tetris. (Optimistic Policy Iteration and Learning the Game of Tetris)
暂无分享,去创建一个
[1] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[2] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[3] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[4] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[5] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[6] Arun D Kulkarni,et al. Neural Networks for Pattern Recognition , 1991 .
[7] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[8] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[9] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[10] Rémi Munos,et al. Error Bounds for Approximate Policy Iteration , 2003, ICML.
[11] B. Scherrer,et al. Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems , 2010, ICML.
[12] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[15] Donald Carr. Applying reinforcement learning to Tetris , 2005 .
[16] Bruno Scherrer,et al. Building Controllers for Tetris , 2009, J. Int. Comput. Games Assoc..
[17] Helge J. Ritter,et al. Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.
[18] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[19] Csaba Szepesvári,et al. Finite time bounds for sampling based fitted value iteration , 2005, ICML.
[20] C. Gielen,et al. Neural computation and self-organizing maps, an introduction , 1993 .
[21] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[22] Michail G. Lagoudakis,et al. Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.
[23] U. Rieder,et al. Markov Decision Processes , 2010 .
[24] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[25] Niko Bohm,et al. An Evolutionary Approach to Tetris , 2005 .
[26] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[27] D.M. Mount,et al. An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..
[28] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[29] Benjamin Van Roy,et al. Tetris: A Study of Randomized Constraint Sampling , 2006 .
[30] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[31] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[32] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[33] Jan Ramon,et al. On the numeric stability of Gaussian processes regression for relational reinforcement learning , 2004, ICML 2004.
[34] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[35] B. Scherrer,et al. Performance bound for Approximate Optimistic Policy Iteration , 2010 .
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] Erik D. Demaine,et al. Tetris is hard, even to approximate , 2002, Int. J. Comput. Geom. Appl..
[38] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[39] Huizhen Yu,et al. Convergence of Least Squares Temporal Difference Methods Under General Conditions , 2010, ICML.
[40] Bruno Scherrer,et al. Performance Bounds for Lambda Policy Iteration , 2007, ArXiv.
[41] Teuvo Kohonen,et al. Self-Organization and Associative Memory , 1988 .
[42] Philippe Preux,et al. Feature Discovery in Reinforcement Learning Using Genetic Programming , 2008, EuroGP.
[43] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[44] D. F. Marks,et al. An introduction , 1988, Experientia.
[45] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.
[46] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[47] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[48] Christopher Thiery,et al. Une approche modifiée de Lambda-Policy Iteration , 2009 .
[49] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.