论文信息 - Reducing Transmission Delay in EDCA Using Policy Gradient Reinforcement Learning

Reducing Transmission Delay in EDCA Using Policy Gradient Reinforcement Learning

Towards ultra-reliable and low-latency communications, this paper proposes a packet mapping algorithm in an enhanced distributed channel access (EDCA) scheme using policy gradient reinforcement learning (RL). The EDCA scheme provides higher priority packets with more transmission opportunities by mapping packets to a predefined access category (AC); thereby, the EDCA scheme supports a higher quality of service in wireless local area networks. In this paper, it is noted that by mapping high priority packets to lower priority ACs, the one-packet delay of a high priority packet can be reduced. In contrast, the mapping algorithm cannot minimize the multiple-packets delay because the mapping algorithm is based on the current status. This is because, from a long-term perspective, mapping high priority packets is required as a countermeasure for collisions, to minimize the multiple-packets delay. As a solution, this paper proposes a new mapping algorithm using RL because RL is suitable for maximizing the reward from a long-term perspective. The key idea is to design the state such that the state involves the number of packets having arrived at each AP in the past, which is an indicator expressing past status. In the designed RL task, the reward, i.e., the multiple-packets delay depends on an overall sequence of states and actions; hence, the recursive value function-based RL algorithms are not compatible. To solve this problem, this paper utilizes policy gradient RL, which learns the packet mapping policy from an overall state-action sequence and a consequent multiple-packets delay. The simulation result reveals that the transmission delay of the proposed mapping algorithm is 13.8% shorter than that of the conventional EDCA mapping algorithm.

[1] Noureddine Doghmane,et al. Cross-layer scheme for low latency multiple description video streaming over Vehicular Ad-hoc NETworks (VANETs) , 2019 .

[2] A. M. Abdullah,et al. Wireless lan medium access control (mac) and physical layer (phy) specifications , 1997 .

[3] Chih-Heng Ke,et al. An adaptive cross-layer mapping algorithm for MPEG-4 video transmission over IEEE 802.11e WLAN , 2009, Telecommun. Syst..

[4] Shuang-Hua Yang,et al. Ipb-frame adaptive mapping mechanism for video transmission over IEEE 802.11e WLANs , 2014, CCRV.

[5] Aliaa A. A. Youssif,et al. MPEG-4 Video Transmission Over IEEE 802.11e Wireless Mesh Networks Using Dynamic-Cross-Layer Approach , 2015 .

[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.