DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

In fully cooperative multi-agent reinforcement learning (MARL) settings, the environments are highly stochastic due to the partial observability of each agent and the continuously changing policies of the other agents. To address the above issues, we integrate distributional RL and value function factorization methods by proposing a Distributional Value Function Factorization (DFAC) framework to generalize expected value function factorization methods to their DFAC variants. DFAC extends the individual utility functions from deterministic variables to random variables, and models the quantile function of the total return as a quantile mixture. To validate DFAC, we demonstrate DFAC’s ability to factorize a simple two-step matrix game with stochastic rewards and perform experiments on all Super Hard tasks of StarCraft Multi-Agent Challenge, showing that DFAC is able to outperform expected value function factorization baselines.

[1]  Bo Liu,et al.  QUOTA: The Quantile Option Architecture for Reinforcement Learning , 2018, AAAI.

[2]  Tie-Yan Liu,et al.  Fully Parameterized Quantile Function for Distributional Reinforcement Learning , 2019, NeurIPS.

[3]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[4]  Marc G. Bellemare,et al.  Statistics and Samples in Distributional Reinforcement Learning , 2019, ICML.

[5]  Andreas Krause,et al.  Information-Directed Exploration for Deep Reinforcement Learning , 2018, ICLR.

[6]  Qi Zhang,et al.  Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control , 2019, NeurIPS.

[7]  Rémi Munos,et al.  Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[8]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[9]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[10]  Christopher Amato,et al.  Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning , 2018, AAMAS.

[11]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[12]  Yaoliang Yu,et al.  Distributional Reinforcement Learning for Efficient Exploration , 2019, ICML.

[13]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[14]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[15]  Nicolas Le Roux,et al.  Distributional reinforcement learning with linear function approximation , 2019, AISTATS.

[16]  Juha Karvanen,et al.  Estimation of quantile mixtures via L-moments and trimmed L-moments , 2006, Comput. Stat. Data Anal..

[17]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[18]  Marc G. Bellemare,et al.  A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.

[19]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[20]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[21]  Tie-Yan Liu,et al.  Distributional Reward Decomposition for Reinforcement Learning , 2019, NeurIPS.

[22]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[23]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[24]  Lei Han,et al.  LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[25]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[26]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.