Learning Fairness in Multi-Agent Systems

Fairness is essential for human society, contributing to stability and productivity. Similarly, fairness is also the key for many multi-agent systems. Taking fairness into multi-agent learning could help multi-agent systems become both efficient and stable. However, learning efficiency and fairness simultaneously is a complex, multi-objective, joint-policy optimization. To tackle these difficulties, we propose FEN, a novel hierarchical reinforcement learning model. We first decompose fairness for each agent and propose fair-efficient reward that each agent learns its own policy to optimize. To avoid multi-objective conflict, we design a hierarchy consisting of a controller and several sub-policies, where the controller maximizes the fair-efficient reward by switching among the sub-policies that provides diverse behaviors to interact with the environment. FEN can be trained in a fully decentralized way, making it easy to be deployed in real-world applications. Empirically, we show that FEN easily learns both fairness and efficiency and significantly outperforms baselines in a variety of multi-agent scenarios.

[1]  Stephen P. Boyd,et al.  Distributed average consensus with least-mean-square deviation , 2007, J. Parallel Distributed Comput..

[2]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[3]  Bernd Freisleben,et al.  Virtual Machine Resource Allocation in Cloud Computing via Multi-Agent Fuzzy Control , 2013, 2013 International Conference on Cloud and Green Computing.

[4]  Song Zuo,et al.  The Matthew Effect in Computation Contests: High Difficulty May Lead to 51% Dominance? , 2019, WWW.

[5]  Preeti Ranjan Panda,et al.  Cooperative Multi-Agent Reinforcement Learning-Based Co-optimization of Cores, Caches, and On-chip Network , 2017, ACM Trans. Archit. Code Optim..

[6]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[7]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[8]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[9]  Nico Roos,et al.  Considerations for fairness in multi-agent systems , 2007 .

[10]  M. Stanković Multi-agent reinforcement learning , 2016 .

[11]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[12]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[13]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[14]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[15]  Matjaz Perc,et al.  The Matthew effect in empirical data , 2014, Journal of The Royal Society Interface.

[16]  Nicolas Maudet,et al.  Fairness in Multiagent Resource Allocation with Dynamic and Partial Observations , 2018, AAMAS.

[17]  Julie A. Shah,et al.  Fairness in Multi-Agent Sequential Decision-Making , 2014, NIPS.

[18]  A. van de Rijt,et al.  The Matthew effect in science funding , 2018, Proceedings of the National Academy of Sciences.

[19]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[20]  Tiejun Huang,et al.  Graph Convolutional Reinforcement Learning , 2020, ICLR.

[21]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[22]  Ariel D. Procaccia,et al.  Truth, justice, and cake cutting , 2010, Games Econ. Behav..

[23]  Tie-Yan Liu,et al.  A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network , 2019, AAMAS.

[24]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[25]  Ariel D. Procaccia,et al.  No agent left behind: dynamic fair division of multiple resources , 2013, AAMAS.

[26]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[27]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[28]  Zongqing Lu,et al.  Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation , 2018, ArXiv.

[29]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[30]  Arumugam Nallanathan,et al.  Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks , 2018, IEEE Transactions on Wireless Communications.

[31]  Joel Z. Leibo,et al.  Evolving intrinsic motivations for altruistic behavior , 2018, AAMAS.

[32]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[33]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[34]  Yichuan Jiang,et al.  The Rich Get Richer: Preferential Attachment in the Task Allocation of Cooperative Networked Multiagent Systems With Resource Caching , 2012, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[35]  H. Francis Song,et al.  Machine Theory of Mind , 2018, ICML.

[36]  Ariel D. Procaccia Thou Shalt Covet Thy Neighbor's Cake , 2009, IJCAI.

[37]  Shimon Whiteson,et al.  Traffic Light Control by Multiagent Reinforcement Learning Systems , 2010, Interactive Collaborative Information Systems.

[38]  Zhang-Wei Hong,et al.  A Deep Policy Inference Q-Network for Multi-Agent Systems , 2017, AAMAS.

[39]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.