Deep Reinforcement Learning-Based Hierarchical Time Division Duplexing Control for Dense Wireless and Mobile Networks

Future wireless and mobile network services must accommodate highly dynamic downlink and uplink traffic asymmetry. To fulfill this requirement, the third-generation partnership project (3GPP) introduced the enhanced interference mitigation and traffic adaptation strategy in addition to dynamic time division duplexing (TDD). In this study, we develop a reinforcement learning (RL)-based dynamic TDD framework that effectively controls interference and serves various traffic demands. First, we introduce an interference-penalty model that evaluates interference indirectly based on the duplexing policy. This can significantly reduce overhead for measuring and exchanging channel information in a dense network. Second, we design a new mixed-reward model that consists of the achievable data rate and the implicit interference penalty. Third, we implement deep RL algorithms that base station (BSs) use to train their radio frame configurations (RFCs). The training process at each BS takes into account the traffic demand and the RFCs of the surrounding BSs. The BSs are coordinated in a single-leader multi-follower Stackelberg game, which achieves a global RFC setup that maximizes the data rate and minimizes the interference. Extensive simulations show that the proposed framework stably converges in various environments and provides near-optimal performance equivalent to 95% or more of the full-search-based optimal performance, which is 48.84%, 41.92%, and 62.11% higher than the currently utilized random RFC, fixed RFC, and traffic-matched RFC approaches.