Intelligent Trajectory Design in UAV-Aided Communications With Reinforcement Learning

In this correspondence paper, we focus on a cellular network aided an unmanned aerial vehicle (UAV) that serves as an aerial base station for multiple ground users. The UAV's trajectory design is investigated to maximize the expected uplink sum rate with inaccessibility to user-side information, such as locations and transmit power as well as channel parameters. The problem is formulated as a Markov decision process and solved with model-free reinforcement learning. Due to the continuous and deterministic action space, the deterministic policy gradient (DPG) algorithm is applied for the reinforcement learning model. Experiment results show that due to the great generalizability of the reinforcement learning model, the UAV is able to intelligently track the ground users with the learned trajectory despite being unaware of the user-side information and channel parameters, even when the ground users are mobile. The performance of the learned trajectory is fairly close to that of the optimized trajectory derived through conventional optimization problem solving with such information explicitly known. Moreover, we also show that the DPG algorithm converges efficiently with acceptable training time.

[1]  Xiaoli Xu,et al.  Trajectory Design for Completion Time Minimization in UAV-Enabled Multicasting , 2018, IEEE Transactions on Wireless Communications.

[2]  Rui Zhang,et al.  Wireless communications with unmanned aerial vehicles: opportunities and challenges , 2016, IEEE Communications Magazine.

[3]  Alia Asheralieva,et al.  Bayesian Reinforcement Learning-Based Coalition Formation for Distributed Resource Sharing by Device-to-Device Users in Heterogeneous Cellular Networks , 2017, IEEE Transactions on Wireless Communications.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Walid Saad,et al.  Unmanned Aerial Vehicle With Underlaid Device-to-Device Communications: Performance and Tradeoffs , 2015, IEEE Transactions on Wireless Communications.

[6]  Ismail Güvenç,et al.  Learning Based Frequency- and Time-Domain Inter-Cell Interference Coordination in HetNets , 2014, IEEE Transactions on Vehicular Technology.

[7]  Zhu Han,et al.  User Scheduling and Resource Allocation in HetNets With Hybrid Energy Supply: An Actor-Critic Reinforcement Learning Approach , 2018, IEEE Transactions on Wireless Communications.

[8]  Halim Yanikomeroglu,et al.  3-D Placement of an Unmanned Aerial Vehicle Base Station for Maximum Coverage of Users With Different QoS Requirements , 2017, IEEE Wireless Communications Letters.

[9]  Qingqing Wu,et al.  Fundamental Trade-offs in Communication and Trajectory Design for UAV-Enabled Wireless Network , 2018, IEEE Wireless Communications.

[10]  Walid Saad,et al.  Mobile Unmanned Aerial Vehicles (UAVs) for Energy-Efficient Internet of Things Communications , 2017, IEEE Transactions on Wireless Communications.

[11]  Halim Yanikomeroglu,et al.  On the Number and 3D Placement of Drone Base Stations in Wireless Cellular Networks , 2016, 2016 IEEE 84th Vehicular Technology Conference (VTC-Fall).

[12]  Jie Xu,et al.  Energy Minimization for Wireless Communication With Rotary-Wing UAV , 2018, IEEE Transactions on Wireless Communications.

[13]  Jie Xu,et al.  UAV-Enabled Wireless Power Transfer: Trajectory Design and Energy Optimization , 2017, IEEE Transactions on Wireless Communications.

[14]  Rui Zhang,et al.  Throughput Maximization for UAV-Enabled Mobile Relaying Systems , 2016, IEEE Transactions on Communications.

[15]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.