Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] James S. Albus,et al. I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .
[3] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[4] K. S. Shanmugam,et al. Digital and analog communication systems , 1979 .
[5] R. J. Richards,et al. An Introduction to Dynamics and Control , 1979 .
[6] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[7] Anuradha M. Annaswamy,et al. Stable Adaptive Systems , 1989 .
[8] David W. Aha,et al. Instance‐based prediction of real‐valued attributes , 1989, Comput. Intell..
[9] Sridhar Mahadevan,et al. Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture , 1991, ML.
[10] Christopher G. Atkeson,et al. Memory-Based Learning Control , 1991, 1991 American Control Conference.
[11] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[12] Pentti Kanerva,et al. Sparse distributed memory and related models , 1993 .
[13] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[14] Andrew McCallum,et al. Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.
[15] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine-mediated learning.
[16] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .
[17] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..
[18] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[19] Pawel Cichosz,et al. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning , 1994, J. Artif. Intell. Res..
[20] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[21] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[22] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .
[23] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[24] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[25] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[26] Martin C. Cooper. Fundamental Properties of Neighbourhood Substitution in Constraint Satisfaction Problems , 1997, Artif. Intell..
[27] Ashwin Ram,et al. Continuous Case-Based Reasoning , 1997, Artif. Intell..