Scaling Up Average Reward Reinforcement Learning by Approximating the Domain Models and the Value Function
暂无分享,去创建一个
[1] George C. Canavos,et al. Applied probability and statistical methods , 1984 .
[2] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[3] A. Jalali,et al. Computationally efficient adaptive control algorithms for Markov chains , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[4] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[5] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[6] Ann E. Nicholson,et al. The Data Association Problem when Monitoring Robot Vehicles Using Dynamic Belief Networks , 1992, ECAI.
[7] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[8] Uffe Kjærulff,et al. A Computational Scheme for Reasoning in Dynamic Probabilistic Networks , 1992, UAI.
[9] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[10] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[11] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[12] S. Schaal,et al. Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.
[13] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .
[14] Craig Boutilier,et al. Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.
[15] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[16] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[17] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[18] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[19] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.
[20] Sridhar Mahadevan,et al. Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning , 1996, ICML.