RSS-Based UAV-BS 3-D Mobility Management via Policy Gradient Deep Reinforcement Learning

We address the mobility management of an autonomous UAV-mounted base station (UAV-BS) that provides communication services to a cluster of users on the ground while the geographical characteristics (e.g., location and boundary) of the cluster, the geographical locations of the users, and the characteristics of the radio environment are unknown. UAV-BS solely exploits the received signal strengths (RSS) from the users and accordingly chooses its (continuous) 3-D speed to constructively navigate, Le., improving the transmitted data rate. To compensate for the lack of a model, we adopt policy gradient deep reinforcement learning. As our approach does not rely on any particular information about the users as well as the radio environment, it is flexible and respects the privacy concerns. Our experiments indicate that despite the minimum available information the UAV-BS is able to distinguish between high-rise (often non-line-of-sight dominant) and sub-urban (mainly line-of-sight dominant) environments such that in the former (resp. latter) it tends to reduce (resp. increase) its height and stays close (resp. far) to the cluster. We further observe that the choice of the reward function affects the speed and the ability of the agent to adhere to the problem constraints without affecting the delivered data rate.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[3]  David Gesbert,et al.  Learning to Rest: A Q-Learning Approach to Flying Base Station Trajectory Design with Landing Spots , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[4]  Halim Yanikomeroglu,et al.  Efficient 3-D placement of an aerial base station in next generation cellular networks , 2016, 2016 IEEE International Conference on Communications (ICC).

[5]  Victor C. M. Leung,et al.  Randomized Caching in Cooperative UAV-Enabled Fog-RAN , 2019, 2019 IEEE Wireless Communications and Networking Conference (WCNC).

[6]  Kandeepan Sithamparanathan,et al.  Optimal LAP Altitude for Maximum Coverage , 2014, IEEE Wireless Communications Letters.

[7]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Qingqing Wu,et al.  Accessing From the Sky: A Tutorial on UAV Communications for 5G and Beyond , 2019, Proceedings of the IEEE.

[10]  Danijela Cabric,et al.  UAV Access Point Placement for Connectivity to a User with Unknown Location Using Deep RL , 2019, 2019 IEEE Globecom Workshops (GC Wkshps).

[11]  Fatih Erden,et al.  RSS-Based Q-Learning for Indoor UAV Navigation , 2019, MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM).

[12]  Victor C. M. Leung,et al.  How Do Non-Ideal UAV Antennas Affect Air-to-Ground Communications? , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[13]  Rui Zhang,et al.  Wireless communications with unmanned aerial vehicles: opportunities and challenges , 2016, IEEE Communications Magazine.

[14]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[15]  Halim Yanikomeroglu,et al.  3-D Placement of an Unmanned Aerial Vehicle Base Station for Maximum Coverage of Users With Different QoS Requirements , 2017, IEEE Wireless Communications Letters.

[16]  Yong Zeng,et al.  Path Design for Cellular-Connected UAV with Reinforcement Learning , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).