Reducing Transmission Delay in EDCA Using Policy Gradient Reinforcement Learning

Towards ultra-reliable and low-latency communications, this paper proposes a packet mapping algorithm in an enhanced distributed channel access (EDCA) scheme using policy gradient reinforcement learning (RL). The EDCA scheme provides higher priority packets with more transmission opportunities by mapping packets to a predefined access category (AC); thereby, the EDCA scheme supports a higher quality of service in wireless local area networks. In this paper, it is noted that by mapping high priority packets to lower priority ACs, the one-packet delay of a high priority packet can be reduced. In contrast, the mapping algorithm cannot minimize the multiple-packets delay because the mapping algorithm is based on the current status. This is because, from a long-term perspective, mapping high priority packets is required as a countermeasure for collisions, to minimize the multiple-packets delay. As a solution, this paper proposes a new mapping algorithm using RL because RL is suitable for maximizing the reward from a long-term perspective. The key idea is to design the state such that the state involves the number of packets having arrived at each AP in the past, which is an indicator expressing past status. In the designed RL task, the reward, i.e., the multiple-packets delay depends on an overall sequence of states and actions; hence, the recursive value function-based RL algorithms are not compatible. To solve this problem, this paper utilizes policy gradient RL, which learns the packet mapping policy from an overall state-action sequence and a consequent multiple-packets delay. The simulation result reveals that the transmission delay of the proposed mapping algorithm is 13.8% shorter than that of the conventional EDCA mapping algorithm.