Three-Dimensional Trajectory Design for Multi-User MISO UAV Communications: A Deep Reinforcement Learning Approach

In this paper, we investigate a multi-user downlink multiple-input single-output (MISO) unmanned aerial vehicle (UAV) communication system, where a multi-antenna UAV is employed to serve multiple ground terminals. Unlike existing approaches focus only on a simplified two-dimensional scenario, this paper considers a three-dimensional (3D) urban environment, where the UAV's 3D trajectory is designed to minimize data transmission completion time subject to practical throughput and flight movement constraints. Specifically, we propose a deep reinforcement learning (DRL)-based trajectory design for completion time minimization (DRL- TDCTM), which is developed from a deep deterministic policy gradient algorithm. In particular, to represent the state information of UAV and environment, we set an additional information, i.e., the merged pheromone, as a reference of reward which facilitates the algorithm design. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. Finally, simulation results show the superiority of the proposed DRL- TDCTM algorithm over the conventional baseline methods.

[1]  Xiaoli Xu,et al.  Trajectory Design for Completion Time Minimization in UAV-Enabled Multicasting , 2018, IEEE Transactions on Wireless Communications.

[2]  Xuemin Shen,et al.  3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach , 2020, IEEE Transactions on Wireless Communications.

[3]  Xiqi Gao,et al.  Compressive Sensing-Based Adaptive Active User Detection and Channel Estimation: Massive Access Meets Massive MIMO , 2019, IEEE Transactions on Signal Processing.

[4]  Yong Zeng,et al.  Path Design for Cellular-Connected UAV with Reinforcement Learning , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[5]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Chi Harold Liu,et al.  Energy-Efficient Distributed Mobile Crowd Sensing: A Deep Learning Approach , 2019, IEEE Journal on Selected Areas in Communications.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Wei Li,et al.  Data-Aided Doppler Frequency Shift Estimation and Compensation for UAVs , 2020, IEEE Internet of Things Journal.

[10]  Rui Zhang,et al.  Wireless communications with unmanned aerial vehicles: opportunities and challenges , 2016, IEEE Communications Magazine.

[11]  Propagation data and prediction methods required for the design of terrestrial broadband radio access systems operating in a frequency range from 3 to 60 GHz , 2022 .

[12]  Kandeepan Sithamparanathan,et al.  Optimal LAP Altitude for Maximum Coverage , 2014, IEEE Wireless Communications Letters.

[13]  Rui Zhang,et al.  Multi-Antenna UAV Data Harvesting: Joint Trajectory and Communication Optimization , 2020, J. Commun. Inf. Networks.

[14]  Bin Li,et al.  UAV Communications for 5G and Beyond: Recent Advances and Future Trends , 2019, IEEE Internet of Things Journal.

[15]  Derrick Wing Kwan Ng,et al.  Multiuser MISO UAV Communications in Uncertain Environments With No-Fly Zones: Robust Trajectory and Resource Allocation Design , 2019, IEEE Transactions on Communications.