Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV Based Random Access IoT Networks With NOMA

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers. Specifically, IoT devices contend for accessing the shared wireless channel using an adaptive $p$-persistent slotted Aloha protocol; and the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to decode multiple received data from IoT devices to improve access efficiency. To enable an energy-sustainable capacity-optimal network, we study the joint problem of dynamic multi-UAV altitude control and multi-cell wireless channel access management of IoT devices as a stochastic control problem with multiple energy constraints. To learn an optimal control policy, we first formulate this problem as a Constrained Markov Decision Process (CMDP), and propose an online model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on Lagrangian primal-dual policy optimization to solve the CMDP. Extensive simulations demonstrate that our proposed algorithm learns a cooperative policy among UAVs in which the altitude of UAVs and channel access probability of IoT devices are dynamically and jointly controlled to attain the maximal long-term network capacity while maintaining energy sustainability of UAVs. The proposed algorithm outperforms Deep RL based solutions with reward shaping to account for energy costs, and achieves a temporal average system capacity which is $82.4\%$ higher than that of a feasible DRL based solution, and only $6.47\%$ lower compared to that of the energy-constraint-free system.

[1]  Shalabh Bhatnagar,et al.  An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..

[2]  Jinho Choi,et al.  NOMA-Based Random Access With Multichannel ALOHA , 2017, IEEE Journal on Selected Areas in Communications.

[3]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[4]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[5]  Joseph A. Paradiso,et al.  The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[6]  Zhaodan Kong,et al.  A Survey of Motion Planning Algorithms from the Perspective of Autonomous UAV Guidance , 2010, J. Intell. Robotic Syst..

[7]  Marco Pavone,et al.  Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[8]  Rui Zhang,et al.  Placement Optimization of UAV-Mounted Mobile Base Stations , 2016, IEEE Communications Letters.

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Sami Khairy,et al.  Sustainable Wireless IoT Networks With RF Energy Charging Over Wi-Fi (CoWiFi) , 2019, IEEE Internet of Things Journal.

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Lingyang Song,et al.  Reinforcement Learning for Decentralized Trajectory Design in Cellular UAV Networks With Sense-and-Send Protocol , 2018, IEEE Internet of Things Journal.

[13]  Zhu Han,et al.  A Renewal Theory Based Analytical Model for Multi-Channel Random Access in IEEE 802.11ac/ax , 2019, IEEE Transactions on Mobile Computing.

[14]  Robert Mahony,et al.  Thrust Control for Multirotor Aerial Vehicles , 2017, IEEE Transactions on Robotics.

[15]  Walid Saad,et al.  A Tutorial on UAVs for Wireless Networks: Applications, Challenges, and Open Problems , 2018, IEEE Communications Surveys & Tutorials.

[16]  Andrey V. Savkin,et al.  A Method for Optimized Deployment of a Network of Surveillance Aerial Drones , 2019, IEEE Systems Journal.

[17]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[18]  H. Vincent Poor,et al.  UAV-Enabled Communication Using NOMA , 2018, IEEE Transactions on Communications.

[19]  Mohamed Kassab,et al.  LoRa technology MAC layer operations and Research issues , 2018, ANT/SEIT.

[20]  Qingkai Liang,et al.  Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning , 2018, ArXiv.

[21]  Claire J. Tomlin,et al.  Quadrotor Helicopter Flight Dynamics and Control: Theory and Experiment , 2007 .

[22]  Shie Mannor,et al.  A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[23]  Derrick Wing Kwan Ng,et al.  Optimal 3D-Trajectory Design and Resource Allocation for Solar-Powered UAV Communication Systems , 2018, IEEE Transactions on Communications.

[24]  Vivek S. Borkar,et al.  An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[25]  Qingqing Wu,et al.  Joint Trajectory and Communication Design for Multi-UAV Enabled Wireless Networks , 2017, IEEE Transactions on Wireless Communications.

[26]  Gabriel-Miro Muntean,et al.  Ultra-Reliable IoT Communications with UAVs: A Swarm Use Case , 2018, IEEE Communications Magazine.

[27]  Shalabh Bhatnagar An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..

[28]  Xiao Liu,et al.  Trajectory Design and Power Control for Multi-UAV Assisted Wireless Networks: A Machine Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[29]  Jun-Bae Seo,et al.  Nonorthogonal Random Access for 5G Mobile Communication Systems , 2018, IEEE Transactions on Vehicular Technology.

[30]  Haitao Zhao,et al.  Deployment Algorithms for UAV Airborne Networks Toward On-Demand Coverage , 2018, IEEE Journal on Selected Areas in Communications.

[31]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[32]  Jinho Choi,et al.  A Game-Theoretic Approach for NOMA-ALOHA , 2018, 2018 European Conference on Networks and Communications (EuCNC).

[33]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[34]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[35]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Walid Saad,et al.  Efficient Deployment of Multiple Unmanned Aerial Vehicles for Optimal Wireless Coverage , 2016, IEEE Communications Letters.

[38]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[39]  E. Altman Constrained Markov Decision Processes , 1999 .

[40]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[41]  Joo-Seok Lee,et al.  Optimal Path Planning of Solar-Powered UAV Using Gravitational Potential Energy , 2017, IEEE Transactions on Aerospace and Electronic Systems.

[42]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[43]  Nikolaos Papanikolopoulos,et al.  Solar powered UAV: Design and experiments , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Michael C. Fu,et al.  Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint , 2018, ArXiv.

[45]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.

[46]  Roland Siegwart,et al.  Perpetual flight with a small solar-powered UAV: Flight results, performance analysis and model validation , 2016, 2016 IEEE Aerospace Conference.

[47]  Mohamed-Slim Alouini,et al.  Joint Trajectory and Precoding Optimization for UAV-Assisted NOMA Networks , 2019, IEEE Transactions on Communications.

[48]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Yu Cheng,et al.  Optimizing Non-Orthogonal Multiple Access in Random Access Networks , 2020, 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring).

[51]  Walid Saad,et al.  Deep Reinforcement Learning for Minimizing Age-of-Information in UAV-Assisted Networks , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[52]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[53]  Lin Xiao,et al.  Throughput Maximization in Multi-UAV Enabled Communication Systems With Difference Consideration , 2018, IEEE Access.

[54]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[55]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.