Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games
暂无分享,去创建一个
[1] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[2] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .
[3] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[4] Peter Stone,et al. Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.
[5] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[6] Manuela Veloso,et al. Distributed, Play-Based Role Assignment for Robot Teams in Dynamic Environments , 2006, DARS.
[7] Shobha Venkataraman,et al. Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..
[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[9] Craig Boutilier,et al. Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.
[10] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[11] Tamio Arai,et al. Distributed Autonomous Robotic Systems 3 , 1998 .
[12] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[14] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .