Counter example for Q-bucket-brigade under prediction problem
暂无分享,去创建一个
[1] M. Pelikán,et al. Analyzing the evolutionary pressures in XCS , 2001 .
[2] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[3] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[4] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[5] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.
[6] Larry Bull,et al. A memetic accuracy-based neural learning classifier system , 2005, 2005 IEEE Congress on Evolutionary Computation.
[7] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[8] Artur Merke,et al. Convergence of synchronous reinforcement learning with linear function approximation , 2004, ICML '04.
[9] Keiki Takadama,et al. Learning classifier system equivalent with reinforcement learning with function approximation , 2005, GECCO '05.
[10] Martin V. Butz,et al. Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.
[11] Martin V. Butz,et al. Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.
[12] Osamu Katai,et al. Comparing Learning Classifier System and Reinforcement Learning with Function Approximation , 2004 .
[13] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[14] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[15] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[16] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[17] Osamu Katai,et al. Learning Classifier System with Convergence and Generalization , 2005 .
[18] D. Goldberg,et al. Bounding Learning Time in XCS , 2004, GECCO.
[19] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[20] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[21] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[22] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[23] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[24] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.
[25] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[26] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[27] Pier Luca Lanzi,et al. Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..
[28] Marco Dorigo,et al. A comparison of Q-learning and classifier systems , 1994 .
[29] Lashon B. Booker. Adaptive value function approximations in classifier systems , 2005, GECCO '05.
[30] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .