Real-Time Energy Harvesting Aided Scheduling in UAV-Assisted D2D Networks Relying on Deep Reinforcement Learning

Unmanned aerial vehicle (UAV)-assisted device-to-device (D2D) communications can be deployed flexibly thanks to UAVs’ agility. By exploiting the direct D2D interaction supported by UAVs, both the user experience and network performance can be substantially enhanced at public events. However, the continuous moving of D2D users, limited energy and flying time of UAVs are impediments to their applications in real-time. To tackle this issue, we propose a novel model based on deep reinforcement learning in order to find the optimal solution for the energy-harvesting time scheduling in UAV-assisted D2D communications. To make the system model more realistic, we assume that the UAV flies around a central point, the D2D users move continuously with random walk model and the channel state information encountered during each time slot is randomly time-variant. Our numerical results demonstrate that the proposed schemes outperform the existing solutions. The associated energy efficiency game can be solved in less than one millisecond by an off-the-shelf processor using trained neural networks. Hence our deep reinforcement learning techniques are capable of solving real-time resource allocation problems in UAV-assisted wireless networks.

[1]  Shuowen Zhang,et al.  Multi-Beam UAV Communication in Cellular Uplink: Cooperative Interference Cancellation and Sum-Rate Maximization , 2018, IEEE Transactions on Wireless Communications.

[2]  Donald H Jenkins,et al.  Use of unmanned aerial vehicles for medical product transport. , 2015, Air medical journal.

[3]  Lajos Hanzo,et al.  The Transmit-Energy vs Computation-Delay Trade-Off in Gateway-Selection for Heterogenous Cloud Aided Multi-UAV Systems , 2019, IEEE Transactions on Communications.

[4]  Trung Quang Duong,et al.  An Introduction of Real-time Embedded Optimisation Programming for UAV Systems under Disaster Communication , 2018, EAI Endorsed Trans. Ind. Networks Intell. Syst..

[5]  Kandeepan Sithamparanathan,et al.  Optimal LAP Altitude for Maximum Coverage , 2014, IEEE Wireless Communications Letters.

[6]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[7]  Hoang Duong Tuan,et al.  Optimal Video Streaming in Dense 5G Networks With D2D Communications , 2018, IEEE Access.

[8]  Lajos Hanzo,et al.  A Beamforming-Aided Full-Diversity Scheme for Low-Altitude Air-to-Ground Communication Systems Operating With Limited Feedback , 2018, IEEE Transactions on Communications.

[9]  Minh-Nghia Nguyen,et al.  Non-Cooperative Energy Efficient Power Allocation Game in D2D Communication: A Multi-Agent Deep Reinforcement Learning Approach , 2019, IEEE Access.

[10]  Hoang Duong Tuan,et al.  Learning-Aided Realtime Performance Optimisation of Cognitive UAV-Assisted Disaster Communication , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[11]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[12]  Hoang Duong Tuan,et al.  Real-Time Optimal Resource Allocation for Embedded UAV Communication Systems , 2018, IEEE Wireless Communications Letters.

[13]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[14]  Qingqing Wu,et al.  Joint Trajectory and Communication Design for Multi-UAV Enabled Wireless Networks , 2017, IEEE Transactions on Wireless Communications.

[15]  Rui Zhang,et al.  Energy-Efficient Data Collection in UAV Enabled Wireless Sensor Network , 2017, IEEE Wireless Communications Letters.

[16]  Soung Chang Liew,et al.  Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[17]  Hyungsik Ju,et al.  Throughput Maximization in Wireless Powered Communication Networks , 2013, IEEE Trans. Wirel. Commun..

[18]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[19]  Hiroko Onishi,et al.  Drones: military weapons, surveillance or mapping tools for environmental monitoring? The need for legal framework is required , 2017 .

[20]  Ying Jun Zhang,et al.  Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks , 2018, IEEE Transactions on Mobile Computing.

[21]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[22]  Theodoros A. Tsiftsis,et al.  Resource Allocation for Energy Harvesting-Powered D2D Communication Underlaying UAV-Assisted Networks , 2018, IEEE Transactions on Green Communications and Networking.

[23]  Jie Xu,et al.  Throughput Maximization for UAV-Enabled Wireless Powered Communication Networks , 2018, IEEE Internet of Things Journal.

[24]  Geoffrey Ye Li,et al.  Deep Reinforcement Learning Based Resource Allocation for V2V Communications , 2018, IEEE Transactions on Vehicular Technology.

[25]  Long D. Nguyen,et al.  Distributed Deep Deterministic Policy Gradient for Power Allocation Control in D2D-Based V2V Communications , 2019, IEEE Access.

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[27]  Long D. Nguyen,et al.  Practical Optimisation of Path Planning and Completion Time of Data Collection for UAV-enabled Disaster Communications , 2019, 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC).

[28]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[29]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[30]  Long D. Nguyen,et al.  Role of UAVs in Public Safety Communications: Energy Efficiency Perspective , 2019, IEEE Access.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Tao Jiang,et al.  Energy-Efficient Device-to-Device Communications for Green Smart Cities , 2018, IEEE Transactions on Industrial Informatics.

[33]  Ayse Kortun,et al.  Real-Time Deployment and Resource Allocation for Distributed UAV Systems in Disaster Relief , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[34]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..