Reinforcement Learning Random Access for Delay-Constrained Heterogeneous Wireless Networks: A Two-User Case

In this paper, we investigate the random access problem for a delay-constrained heterogeneous wireless network. As a first attempt to study this new problem, we consider a network with two users who deliver delay-constrained traffic to an access point (AP) via a common unreliable collision wireless channel. By assuming that one user (called user 1) adopts ALOHA, we aim to optimize the random access scheme of the other user (called user 2). The most intriguing part of this problem is that user 2 does not know the information of user 1 but needs to maximize the system timely throughput. Such a paradigm of collaboratively sharing spectrum is envisioned by DARPA to better dynamically match the supply and demand in future networks [1], [2]. We first propose a Markov Decision Process (MDP) formulation to derive a model-based upper bound so as to quantify the performance gap of any designed schemes. We then utilize reinforcement learning (RL) to design an R-learning-based [3]–[5] random access scheme, called TSRA. We carry out extensive simulations to show that TSRA achieves close-to-upper-bound performance and better performance than the existing baseline DLMA [6], which is our counterpart scheme for delay-unconstrained heterogeneous wireless network.

[1]  Paul Tilghman,et al.  Will rule the airwaves: A DARPA grand challenge seeks autonomous radios to manage the wireless spectrum , 2019, IEEE Spectrum.

[2]  Mehdi Bennis,et al.  Toward Low-Latency and Ultra-Reliable Virtual Reality , 2018, IEEE Network.

[3]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[4]  Panganamala Ramana Kumar,et al.  Cyber–Physical Systems: A Perspective at the Centennial , 2012, Proceedings of the IEEE.

[5]  Yan Zhang,et al.  Modeling Prioritized Broadcasting in Multichannel Vehicular Networks , 2012, IEEE Transactions on Vehicular Technology.

[6]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[7]  Soung Chang Liew,et al.  Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[8]  Gerhard P. Fettweis,et al.  The Tactile Internet: Applications and Challenges , 2014, IEEE Vehicular Technology Magazine.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Lei Deng,et al.  Scheduling Algorithms for Wireless Downlink with Deadline and Retransmission Constraints , 2020, 2020 IEEE 20th International Conference on Communication Technology (ICCT).

[11]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[12]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[13]  Gerhard Fettweis,et al.  Wireless Networked Multirobot Systems in Smart Factories , 2021, Proceedings of the IEEE.

[14]  Jun Li,et al.  Achieving Maximum Reliability in Deadline-Constrained Random Access With Multiple-Packet Reception , 2019, IEEE Transactions on Vehicular Technology.

[15]  Lei Deng,et al.  Timely Wireless Flows With General Traffic Patterns: Capacity Region and Scheduling Algorithms , 2017, IEEE/ACM Transactions on Networking.

[16]  Yunghsiang Sam Han,et al.  On the Asymptotic Performance of Delay-Constrained Slotted ALOHA , 2018, 2018 27th International Conference on Computer Communication and Networks (ICCCN).

[17]  Soung Chang Liew,et al.  Non-Uniform Time-Step Deep Q-Network for Carrier-Sense Multiple Access in Heterogeneous Wireless Networks , 2019, IEEE Transactions on Mobile Computing.