Coordinated Reinforcement Learning
暂无分享,去创建一个
Michail G. Lagoudakis | Carlos Guestrin | Ronald Parr | Ronald E. Parr | M. Lagoudakis | Carlos Guestrin
[1] Umberto Bertelè,et al. Nonserial Dynamic Programming , 1972 .
[2] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[3] Frank Jensen,et al. From Influence Diagrams to junction Trees , 1994, UAI.
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[6] Rina Dechter,et al. Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..
[7] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[8] Kagan Tumer,et al. General principles of learning-based multi-agent systems , 1999, AGENTS '99.
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.
[11] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[12] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[13] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[14] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[15] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[16] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[17] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.