On Average Versus Discounted Reward Temporal-Difference Learning
暂无分享,去创建一个
[1] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[2] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[3] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[4] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[5] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[6] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[7] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[8] John N. Tsitsiklis,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[9] John N. Tsitsiklis,et al. Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks , 1997, NIPS.
[10] Vivek S. Borkar,et al. Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning Algorithms , 1997, SIAM J. Control. Optim..
[11] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[12] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using reinforcement learning , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[13] John N. Tsitsiklis,et al. Call Admission Control and Routing in Integrated Service Networks Using Reinforcement Learning , 2002 .