Multi-UAV Path Planning for Wireless Data Harvesting with Deep Reinforcement Learning

Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number and position of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve by training a double deep Q-network (DDQN) to approximate the optimal UAV control policy. By exploiting global-local maps of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints.

[1]  Federico Chiariotti,et al.  Distributed reinforcement learning for flexible UAV swarm control with transfer learning capabilities , 2020, DroNet@MobiSys.

[2]  David Gesbert,et al.  Learning to Communicate in UAV-Aided Wireless Networks: Map-Based Approaches , 2018, IEEE Internet of Things Journal.

[3]  David Gesbert,et al.  UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning , 2020, ArXiv.

[4]  Xiaowei Li,et al.  Rechargeable Multi-UAV Aided Seamless Coverage for QoS-Guaranteed IoT Networks , 2019, IEEE Internet of Things Journal.

[5]  Lingyang Song,et al.  Cooperative Internet of UAVs: Distributed Trajectory Design by Multi-Agent Deep Reinforcement Learning , 2020, IEEE Transactions on Communications.

[6]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[7]  Rui Zhang,et al.  Trajectory Design for Distributed Estimation in UAV-Enabled Wireless Sensor Network , 2018, IEEE Transactions on Vehicular Technology.

[8]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[9]  Jie Xu,et al.  UAV-Enabled Data Collection for Wireless Sensor Networks with Distributed Beamforming , 2020, ArXiv.

[10]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[11]  Walid Saad,et al.  Deep Reinforcement Learning for Minimizing Age-of-Information in UAV-Assisted Networks , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[12]  Paul de Kerret,et al.  Trajectory Optimization for Autonomous Flying Base Station via Reinforcement Learning , 2018, 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[13]  Zhu Han,et al.  UAV-to-Device Underlay Communications: Age of Information Minimization by Multi-Agent Deep Reinforcement Learning , 2020, IEEE Transactions on Communications.

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[16]  Jon Crowcroft,et al.  Distributed and Energy-Efficient Mobile Crowdsensing with Charging Stations by Deep Reinforcement Learning , 2021, IEEE Transactions on Mobile Computing.

[17]  Changsheng You,et al.  Hybrid Offline-Online Design for UAV-Enabled Data Harvesting in Probabilistic LoS Channels , 2019, IEEE Transactions on Wireless Communications.

[18]  Qingqing Wu,et al.  Accessing From the Sky: A Tutorial on UAV Communications for 5G and Beyond , 2019, Proceedings of the IEEE.

[19]  David Gesbert,et al.  UAV Path Planning using Global and Local Map Information with Deep Reinforcement Learning , 2020, ArXiv.

[20]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[21]  David Gesbert,et al.  UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach , 2020, ArXiv.

[22]  Kamesh Namuduri,et al.  Flying cell towers to the rescue , 2017, IEEE Spectrum.

[23]  Mohsen Guizani,et al.  Design Challenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey and Future Directions , 2018, IEEE Communications Surveys & Tutorials.

[24]  Leonardo Mostarda,et al.  Cognition in UAV-Aided 5G and Beyond Communications: A Survey , 2020, IEEE Transactions on Cognitive Communications and Networking.

[25]  Walid Saad,et al.  A Tutorial on UAVs for Wireless Networks: Applications, Challenges, and Open Problems , 2018, IEEE Communications Surveys & Tutorials.

[26]  Richard S. Sutton,et al.  A Deeper Look at Experience Replay , 2017, ArXiv.

[27]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[28]  Zhiguo Ding,et al.  Model-Free Based Automated Trajectory Optimization for UAVs toward Data Transmission , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[29]  Yan Zhang,et al.  Deep Reinforcement Learning for Fresh Data Collection in UAV-assisted IoT Networks , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).