Distributed Energy-Efficient Multi-UAV Navigation for Long-Term Communication Coverage by Deep Reinforcement Learning

In this paper, we aim to design a fully-distributed control solution to navigate a group of unmanned aerial vehicles (UAVs), as the mobile Base Stations (BSs) to fly around a target area, to provide long-term communication coverage for the ground mobile users. Different from existing solutions that mainly solve the problem from optimization perspectives, we proposed a decentralized deep reinforcement learning (DRL) based framework to control each UAV in a distributed manner. Our goal is to maximize the temporal average coverage score achieved by all UAVs in a task, maximize the geographical fairness of all considered point-of-interests (PoIs), and minimize the total energy consumptions, while keeping them connected and not flying out of the area border. We designed the state, observation, action space, and reward in an explicit manner, and model each UAV by deep neural networks (DNNs). We conducted extensive simulations and found the appropriate set of hyperparameters, including experience replay buffer size, number of neural units for two fully-connected hidden layers of actor, critic, and their target networks, and the discount factor for remembering the future reward. The simulation results justified the superiority of the proposed model over the state-of-the-art DRL-EC<inline-formula><tex-math notation="LaTeX">$^3$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>3</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="tang-ieq1-2908171.gif"/></alternatives></inline-formula> approach based on deep deterministic policy gradient (DDPG), and three other baselines.

[1]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2]  Yuan Shen,et al.  Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[3]  Dong In Kim,et al.  Coverage Probability of 3-D Mobile UAV Networks , 2019, IEEE Wireless Communications Letters.

[4]  Issa M. Khalil,et al.  Efficient 3D placement of a UAV using particle swarm optimization , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[5]  Leïla Azouz Saïdane,et al.  Designing an energy efficient UAV tracking algorithm , 2017, 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC).

[6]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[7]  Sampath Rangarajan,et al.  SkyCore: Moving Core to the Edge for Untethered and Reliable UAV-based LTE Networks , 2018, MobiCom.

[8]  Taoka Hidekazu,et al.  Scenarios for 5G mobile and wireless communications: the vision of the METIS project , 2014, IEEE Communications Magazine.

[9]  Walid Saad,et al.  Wireless Communication Using Unmanned Aerial Vehicles (UAVs): Optimal Transport Theory for Hover Time Optimization , 2017, IEEE Transactions on Wireless Communications.

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Walid Saad,et al.  Caching in the Sky: Proactive Deployment of Cache-Enabled Unmanned Aerial Vehicles for Optimized Quality-of-Experience , 2016, IEEE Journal on Selected Areas in Communications.

[12]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[13]  Kin K. Leung,et al.  Energy-Efficient Event Detection by Participatory Sensing Under Budget Constraints , 2017, IEEE Systems Journal.

[14]  Ismail Güvenç,et al.  UAV assisted heterogeneous networks for public safety communications , 2015, 2015 IEEE Wireless Communications and Networking Conference Workshops (WCNCW).

[15]  Jonathan P. How,et al.  COORDINATION AND CONTROL OF MULTIPLE UAVs , 2002 .

[16]  Giancarlo Fortino,et al.  A fault-tolerant self-organizing flocking approach for UAV aerial survey , 2017, J. Netw. Comput. Appl..

[17]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[18]  So-Yeon Park,et al.  DroneNetX: Network Reconstruction Through Connectivity Probing and Relay Deployment by Multiple UAVs in Ad Hoc Networks , 2018, IEEE Transactions on Vehicular Technology.

[19]  Halim Yanikomeroglu,et al.  3-D Placement of an Unmanned Aerial Vehicle Base Station (UAV-BS) for Energy-Efficient Maximal Coverage , 2017, IEEE Wireless Communications Letters.

[20]  Walid Saad,et al.  Efficient Deployment of Multiple Unmanned Aerial Vehicles for Optimal Wireless Coverage , 2016, IEEE Communications Letters.

[21]  Kin K. Leung,et al.  Dynamic Control of Data Ferries under Partial Observations , 2010, 2010 IEEE Wireless Communication and Networking Conference.

[22]  Yifeng Zhou,et al.  Communication architectures and protocols for networking unmanned aerial vehicles , 2013, 2013 IEEE Globecom Workshops (GC Wkshps).

[23]  Giancarlo Fortino,et al.  A Mission-Oriented Coordination Framework for Teams of Mobile Aerial and Terrestrial Smart Objects , 2016, Mob. Networks Appl..

[24]  Chi Harold Liu,et al.  Energy-Efficient UAV Control for Effective and Fair Communication Coverage: A Deep Reinforcement Learning Approach , 2018, IEEE Journal on Selected Areas in Communications.

[25]  Sarangapani Jagannathan,et al.  Output Feedback Control of a Quadrotor UAV Using Neural Networks , 2010, IEEE Transactions on Neural Networks.

[26]  Walid Saad,et al.  Mobile Unmanned Aerial Vehicles (UAVs) for Energy-Efficient Internet of Things Communications , 2017, IEEE Transactions on Wireless Communications.

[27]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28]  Gianluca Dini,et al.  Drone Path Planning for Secure Positioning and Secure Position Verification , 2017, IEEE Transactions on Mobile Computing.

[29]  Wei Zhang,et al.  Coarse-to-Fine UAV Target Tracking With Deep Reinforcement Learning , 2019, IEEE Transactions on Automation Science and Engineering.

[30]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[31]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[33]  Giorgio C. Buttazzo,et al.  Energy-Aware Coverage Path Planning of UAVs , 2015, 2015 IEEE International Conference on Autonomous Robot Systems and Competitions.

[34]  Xiaofeng Tao,et al.  Cooperative UAV Cluster-Assisted Terrestrial Cellular Networks for Ubiquitous Coverage , 2018, IEEE Journal on Selected Areas in Communications.

[35]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[36]  Malick Ndiaye,et al.  Multi travelling salesman problem formulation , 2017, 2017 4th International Conference on Industrial Engineering and Applications (ICIEA).

[37]  A. Richards,et al.  Decentralized model predictive control of cooperating UAVs , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[38]  Issa M. Khalil,et al.  Indoor Mobile Coverage Problem Using UAVs , 2017, IEEE Systems Journal.

[39]  Ioannis M. Rekleitis,et al.  Optimal complete terrain coverage using an Unmanned Aerial Vehicle , 2011, 2011 IEEE International Conference on Robotics and Automation.

[40]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[41]  Haitao Zhao,et al.  Deployment Algorithms for UAV Airborne Networks Toward On-Demand Coverage , 2018, IEEE Journal on Selected Areas in Communications.

[42]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[43]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[44]  Kin K. Leung,et al.  Toward QoI and Energy-Efficiency in Internet-of-Things Sensory Environments , 2014, IEEE Transactions on Emerging Topics in Computing.

[45]  Xiao Zhang,et al.  Fast Deployment of UAV Networks for Optimal Wireless Coverage , 2017, IEEE Transactions on Mobile Computing.

[46]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[47]  Sofie Pollin,et al.  Ultra Reliable UAV Communication Using Altitude and Cooperation Diversity , 2017, IEEE Transactions on Communications.