Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization

Fairness is a crucial design objective in virtually all network optimization problems, where limited system resources are shared by multiple agents. Recently, reinforcement learning has been successfully applied to autonomous online decision making in many network design and optimization problems. However, most of them try to maximize the long-term (discounted) reward of all agents, without taking fairness into account. In this paper, we propose a family of algorithms that bring fairness to actorcritic reinforcement learning for optimizing general fairness utility functions. In particular, we present a novel method for adjusting the rewards in standard reinforcement learning by a multiplicative weight depending on both the shape of fairness utility and some statistics of past rewards. It is shown that for proper choice of the adjusted rewards, a policy gradient update converges to at least a stationary point of general αfairness utility optimization. It inspires the design of fairness optimization algorithms in actor-critic reinforcement learning. Evaluations show that the proposed algorithm can be easily deployed in real-world network optimization problems, such as wireless scheduling and video QoE optimization, and can significantly improve the fairness utility value over previous heuristics and learning algorithms.

[1]  Ian F. Akyildiz,et al.  QoS-Aware Adaptive Routing in Multi-layer Hierarchical Software Defined Networks: A Reinforcement Learning Approach , 2016, 2016 IEEE International Conference on Services Computing (SCC).

[2]  N. K. Shankaranarayanan,et al.  Exploiting Mobility in Proportional Fair Cellular Scheduling: Measurements and Algorithms , 2014, IEEE/ACM Transactions on Networking.

[3]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[4]  Saurabh Bagchi,et al.  Video through a crystal ball: effect of bandwidth prediction quality on adaptive streaming in mobile environments , 2016, MoVid '16.

[5]  Michael Bredel,et al.  Understanding Fairness and its Impact on Quality of Service in IEEE 802.11 , 2008, IEEE INFOCOM 2009.

[6]  Nico Roos,et al.  Considerations for fairness in multi-agent systems , 2007 .

[7]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[8]  B. L. Miller,et al.  Discrete Dynamic Programming with a Small Interest Rate , 1969 .

[9]  Gengfa Fang,et al.  Resource Allocation for Underlay D2D Communication With Proportional Fairness , 2018, IEEE Transactions on Vehicular Technology.

[10]  Abhinav Sinha,et al.  A General Mechanism Design Methodology for Social Utility Maximization with Linear Constraints , 2014, PERV.

[11]  Nicolas Maudet,et al.  Fairness in Multiagent Resource Allocation with Dynamic and Partial Observations , 2018, AAMAS.

[12]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[13]  Johnny W. Wong,et al.  A Study of Fairness in Packet-Switching Networks , 1982, IEEE Trans. Commun..

[14]  Cyril Leung,et al.  Proportional Fair Multiuser Scheduling in LTE , 2009, IEEE Signal Processing Letters.

[15]  E. L. Hahne,et al.  Round-Robin Scheduling for Max-Min Fairness in Data Networks , 1991, IEEE J. Sel. Areas Commun..

[16]  Klaus Moessner,et al.  Dynamic Heterogeneous Learning Games for Opportunistic Access in LTE-Based Macro/Femtocell Deployments , 2015, IEEE Transactions on Wireless Communications.

[17]  Dafna Shahaf,et al.  Learning to Route , 2017, HotNets.

[18]  Sagar Naik,et al.  A new fairness index for radio resource allocation in wireless networks , 2005, IEEE Wireless Communications and Networking Conference, 2005.

[19]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[20]  Vaneet Aggarwal,et al.  FastTrack: Minimizing Stalls for CDN-based Over-the-top Video Streaming Systems , 2018, ArXiv.

[21]  Jean-Yves Le Boudec,et al.  A Unified Framework for Max-Min and Min-Max Fairness With Applications , 2007, IEEE/ACM Transactions on Networking.

[22]  Ron Meir,et al.  A Convergent Online Single Time Scale Actor Critic Algorithm , 2009, J. Mach. Learn. Res..

[23]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[24]  Nei Kato,et al.  Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning , 2017, IEEE Transactions on Computers.

[25]  Frank Kelly,et al.  Charging and rate control for elastic traffic , 1997, Eur. Trans. Telecommun..

[26]  Zhaoran Wang,et al.  Neural Policy Gradient Methods: Global Optimality and Rates of Convergence , 2019, ICLR.

[27]  Qian Xu,et al.  Evaluating and Boosting Reinforcement Learning for Intra-Domain Routing , 2019, 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[28]  Mung Chiang,et al.  Multiresource Allocation: Fairness–Efficiency Tradeoffs in a Unifying Framework , 2012, IEEE/ACM Transactions on Networking.

[29]  Julie A. Shah,et al.  Fairness in Multi-Agent Sequential Decision-Making , 2014, NIPS.

[30]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[31]  Ashutosh Sabharwal,et al.  An Axiomatic Theory of Fairness , 2009, ArXiv.

[32]  Arumugam Nallanathan,et al.  Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks , 2018, IEEE Transactions on Wireless Communications.

[33]  V. Aggarwal,et al.  Reinforcement Learning with Non-Markovian Rewards. , 2019 .

[34]  Zongqing Lu,et al.  Learning Fairness in Multi-Agent Systems , 2019, NeurIPS.

[35]  Muhammad Ali Imran,et al.  A Cell Outage Management Framework for Dense Heterogeneous Networks , 2016, IEEE Transactions on Vehicular Technology.

[36]  Ruzena Bajcsy,et al.  Congestion control and fairness for many-to-one routing in sensor networks , 2004, SenSys '04.

[37]  Chi Harold Liu,et al.  Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[38]  Ramachandran Ramjee,et al.  Generalized Proportional Fair Scheduling in Third Generation Wireless Data Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[39]  G. Crooks On Measures of Entropy and Information , 2015 .

[40]  Tie-Yan Liu,et al.  A Cooperative Multi-Agent Reinforcement Learning Framework for Resource Balancing in Complex Logistics Network , 2019, AAMAS.

[41]  Albert Cabellos-Aparicio,et al.  Unveiling the potential of Graph Neural Networks for network modeling and optimization in SDN , 2019, SOSR.

[42]  Preeti Ranjan Panda,et al.  Cooperative Multi-Agent Reinforcement Learning-Based Co-optimization of Cores, Caches, and On-chip Network , 2017, ACM Trans. Archit. Code Optim..