Towards User Scheduling for 6G: A Fairness-Oriented Scheduler Using Multi-Agent Reinforcement Learning

User scheduling is a classical problem and key technology in wireless communication, which will still plays an important role in the prospective 6G. There are many sophisticated schedulers that are widely deployed in the base stations, such as Proportional Fairness (PF) and Round-Robin Fashion (RRF). It is known that the Opportunistic (OP) scheduling is the optimal scheduler for maximizing the average user data rate (AUDR) considering the full buffer traffic. But the optimal strategy achieving the highest fairness still remains largely unknown both in the full buffer traffic and the bursty traffic. In this work, we investigate the problem of fairnessoriented user scheduling, especially for the RBG allocation. We build a user scheduler using Multi-Agent Reinforcement Learning (MARL), which conducts distributional optimization to maximize the fairness of the communication system. The agents take the cross-layer information (e.g. RSRP, Buffer size) as state and the RBG allocation result as action, then explore the optimal solution following a well-defined reward function designed for maximizing fairness. Furthermore, we take the 5%-tile user data rate (5TUDR) as the key performance indicator (KPI) of fairness, and compare the performance of MARL scheduling with PF scheduling and RRF scheduling by conducting extensive simulations. And the simulation results show that the proposed MARL scheduling outperforms the traditional schedulers.

[1]  Xianfu Chen,et al.  GAN-Based Deep Distributional Reinforcement Learning for Resource Management in Network Slicing , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[2]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[3]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[4]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[5]  Kaibin Huang,et al.  Towards an Intelligent Edge: Wireless Communication Meets Machine Learning , 2018, ArXiv.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[9]  Mohamed-Slim Alouini,et al.  From a Human-Centric Perspective: What Might 6G Be? , 2019, ArXiv.

[10]  Rong Li,et al.  Buffer-aware Wireless Scheduling based on Deep Reinforcement Learning , 2019, 2020 IEEE Wireless Communications and Networking Conference (WCNC).

[11]  Wei Chen,et al.  The Roadmap to 6G: AI Empowered Wireless Networks , 2019, IEEE Communications Magazine.

[12]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[13]  Ness B. Shroff,et al.  Opportunistic transmission scheduling with resource-sharing constraints in wireless networks , 2001, IEEE J. Sel. Areas Commun..

[14]  Sijing Zhang,et al.  Towards 5G: A Reinforcement Learning-Based Scheduling Solution for Data Traffic Management , 2018, IEEE Transactions on Network and Service Management.

[15]  Sijing Zhang,et al.  A Comparison of Reinforcement Learning Algorithms in Fairness-Oriented OFDMA Schedulers , 2019, Inf..

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Nan Rong,et al.  Cooperative equilibrium , 2010, AAMAS.

[18]  Halim Yanikomeroglu,et al.  Optimal Tradeoff Between Sum-Rate Efficiency and Jain's Fairness Index in Resource Allocation , 2013, IEEE Transactions on Wireless Communications.

[19]  Man-On Pun,et al.  Network-Level System Performance Prediction Using Deep Neural Networks with Cross-Layer Information , 2020, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Lakhmi C. Jain,et al.  Innovations in Multi-Agent Systems and Applications - 1 , 2010 .

[22]  E. L. Hahne,et al.  Round-Robin Scheduling for Max-Min Fairness in Data Networks , 1991, IEEE J. Sel. Areas Commun..

[23]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[24]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[25]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[26]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[27]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[28]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[29]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[30]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[31]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[32]  Walid Saad,et al.  A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems , 2019, IEEE Network.