A Double Q-Learning Approach for Navigation of Aerial Vehicles with Connectivity Constraint

This paper studies the trajectory optimization problem for an aerial vehicle with the mission of flying between a pair of given initial and final locations. The objective is to minimize the travel time of the aerial vehicle ensuring that the communication connectivity constraint required for the safe operation of the aerial vehicle is satisfied. We consider two different criteria for the connectivity constraint of the aerial vehicle which leads to two different scenarios. In the first scenario, we assume that the maximum continuous time duration that the aerial vehicle is out of the coverage of the ground base stations (GBSs) is limited to a given threshold. In the second scenario, however, we assume that the total time periods that the aerial vehicle is not covered by the GBSs is restricted. Based on these two constraints, we formulate two trajectory optimization problems. To solve these non-convex problems, we use an approach based on the double Q-learning method which is a model-free reinforcement learning technique and unlike the existing algorithms does not need perfect knowledge of the environment. Moreover, in contrast to the well-known Qlearning technique, our double Q-learning algorithm does not suffer from the over-estimation issue. Simulation results show that although our algorithm does not require prior information of the environment, it works well and shows near optimal performance.

[1]  Kandeepan Sithamparanathan,et al.  Optimal LAP Altitude for Maximum Coverage , 2014, IEEE Wireless Communications Letters.

[2]  Rui Zhang,et al.  Trajectory Design for Cellular-Connected UAV Under Outage Duration Constraint , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[3]  Elvino S. Sousa,et al.  Reinforcement Learning-Based Trajectory Design for the Aerial Base Stations , 2019, 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[4]  Jie Xu,et al.  Mobile Edge Computing for Cellular-Connected UAV: Computation Offloading and Trajectory Optimization , 2018, 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[5]  Sofie Pollin,et al.  Cellular Coverage-Aware Path Planning for UAVs , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[6]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[7]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[8]  Halim Yanikomeroglu,et al.  Efficient 3D aerial base station placement considering users mobility by reinforcement learning , 2018, 2018 IEEE Wireless Communications and Networking Conference (WCNC).

[9]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[10]  Ismail Güvenç,et al.  Trajectory Optimization for Cellular-Connected UAVs with Disconnectivity Constraint , 2018, 2018 IEEE International Conference on Communications Workshops (ICC Workshops).

[11]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[12]  Halim Yanikomeroglu,et al.  The New Frontier in RAN Heterogeneity: Multi-Tier Drone-Cells , 2016, IEEE Communications Magazine.

[13]  Shuowen Zhang,et al.  Cellular-Enabled UAV Communication: A Connectivity-Constrained Trajectory Optimization Perspective , 2018, IEEE Transactions on Communications.

[14]  Elvino S. Sousa,et al.  Power Efficient Trajectory Optimization for the Cellular-Connected Aerial Vehicles , 2019, 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[15]  Rui Zhang,et al.  Wireless communications with unmanned aerial vehicles: opportunities and challenges , 2016, IEEE Communications Magazine.

[16]  Qingqing Wu,et al.  Joint Trajectory and Communication Design for Multi-UAV Enabled Wireless Networks , 2017, IEEE Transactions on Wireless Communications.