论文信息 - Multi-agent reinforcement learning as a rehearsal for decentralized planning - 字舞流文

Multi-agent reinforcement learning as a rehearsal for decentralized planning

Bikramjit Banerjee | Landon Kraemer | Landon Kraemer | Bikramjit Banerjee

[1] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[2] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[4] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[7] C. Boutilier,et al. Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9] François Charpillet,et al. Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[10] Peter Stone,et al. Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[11] Bikramjit Banerjee,et al. General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[12] Shlomo Zilberstein,et al. Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[13] Nikos A. Vlassis,et al. Q-value Heuristics for Approximate Solutions of Dec-POMDPs , 2007, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents.

[14] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[15] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16] Shimon Whiteson,et al. Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[17] Brahim Chaib-draa,et al. Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[18] Shlomo Zilberstein,et al. Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[19] Alain Dutech,et al. An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs , 2014, J. Artif. Intell. Res..

[20] Feng Wu,et al. Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[21] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[22] Frans A. Oliehoek,et al. Heuristic search for identical payoff Bayesian games , 2010, AAMAS.

[23] Frans A. Oliehoek,et al. Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion , 2011, IJCAI.

[24] Victor R. Lesser,et al. Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.

[25] Amir Massoud Farahmand,et al. Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.

[26] Bikramjit Banerjee,et al. Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs , 2012, AAAI.

[27] Bikramjit Banerjee,et al. Informed Initial Policies for Learning in Dec-POMDPs , 2012, AAAI.

[28] Bikramjit Banerjee,et al. Concurrent reinforcement learning as a rehearsal for decentralized planning under uncertainty , 2013, AAMAS.