Multi-agent reinforcement learning as a rehearsal for decentralized planning

[1]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[2]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[6]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[7]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[10]  Peter Stone,et al.  Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping , 2006, AAAI.

[11]  Bikramjit Banerjee,et al.  General Game Learning Using Knowledge Transfer , 2007, IJCAI.

[12]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[13]  Nikos A. Vlassis,et al.  Q-value Heuristics for Approximate Solutions of Dec-POMDPs , 2007, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents.

[14]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[15]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[17]  Brahim Chaib-draa,et al.  Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[18]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[19]  Alain Dutech,et al.  An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs , 2014, J. Artif. Intell. Res..

[20]  Feng Wu,et al.  Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[21]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[22]  Frans A. Oliehoek,et al.  Heuristic search for identical payoff Bayesian games , 2010, AAMAS.

[23]  Frans A. Oliehoek,et al.  Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion , 2011, IJCAI.

[24]  Victor R. Lesser,et al.  Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.

[25]  Amir Massoud Farahmand,et al.  Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.

[26]  Bikramjit Banerjee,et al.  Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs , 2012, AAAI.

[27]  Bikramjit Banerjee,et al.  Informed Initial Policies for Learning in Dec-POMDPs , 2012, AAAI.

[28]  Bikramjit Banerjee,et al.  Concurrent reinforcement learning as a rehearsal for decentralized planning under uncertainty , 2013, AAMAS.