论文信息 - Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV Based Random Access IoT Networks With NOMA

Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV Based Random Access IoT Networks With NOMA

In this paper, we apply the Non-Orthogonal Multiple Access (NOMA) technique to improve the massive channel access of a wireless IoT network where solar-powered Unmanned Aerial Vehicles (UAVs) relay data from IoT devices to remote servers. Specifically, IoT devices contend for accessing the shared wireless channel using an adaptive $p$-persistent slotted Aloha protocol; and the solar-powered UAVs adopt Successive Interference Cancellation (SIC) to decode multiple received data from IoT devices to improve access efficiency. To enable an energy-sustainable capacity-optimal network, we study the joint problem of dynamic multi-UAV altitude control and multi-cell wireless channel access management of IoT devices as a stochastic control problem with multiple energy constraints. To learn an optimal control policy, we first formulate this problem as a Constrained Markov Decision Process (CMDP), and propose an online model-free Constrained Deep Reinforcement Learning (CDRL) algorithm based on Lagrangian primal-dual policy optimization to solve the CMDP. Extensive simulations demonstrate that our proposed algorithm learns a cooperative policy among UAVs in which the altitude of UAVs and channel access probability of IoT devices are dynamically and jointly controlled to attain the maximal long-term network capacity while maintaining energy sustainability of UAVs. The proposed algorithm outperforms Deep RL based solutions with reward shaping to account for energy costs, and achieves a temporal average system capacity which is $82.4\%$ higher than that of a feasible DRL based solution, and only $6.47\%$ lower compared to that of the energy-constraint-free system.

[1] Shalabh Bhatnagar,et al. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes , 2012, J. Optim. Theory Appl..

[2] Jinho Choi,et al. NOMA-Based Random Access With Multichannel ALOHA , 2017, IEEE Journal on Selected Areas in Communications.

[3] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[4] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[5] Joseph A. Paradiso,et al. The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[6] Zhaodan Kong,et al. A Survey of Motion Planning Algorithms from the Perspective of Autonomous UAV Guidance , 2010, J. Intell. Robotic Syst..

[7] Marco Pavone,et al. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[8] Rui Zhang,et al. Placement Optimization of UAV-Mounted Mobile Base Stations , 2016, IEEE Communications Letters.

[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10] Sami Khairy,et al. Sustainable Wireless IoT Networks With RF Energy Charging Over Wi-Fi (CoWiFi) , 2019, IEEE Internet of Things Journal.

[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12] Lingyang Song,et al. Reinforcement Learning for Decentralized Trajectory Design in Cellular UAV Networks With Sense-and-Send Protocol , 2018, IEEE Internet of Things Journal.

[13] Zhu Han,et al. A Renewal Theory Based Analytical Model for Multi-Channel Random Access in IEEE 802.11ac/ax , 2019, IEEE Transactions on Mobile Computing.

[14] Robert Mahony,et al. Thrust Control for Multirotor Aerial Vehicles , 2017, IEEE Transactions on Robotics.

[15] Walid Saad,et al. A Tutorial on UAVs for Wireless Networks: Applications, Challenges, and Open Problems , 2018, IEEE Communications Surveys & Tutorials.

[16] Andrey V. Savkin,et al. A Method for Optimized Deployment of a Network of Surveillance Aerial Drones , 2019, IEEE Systems Journal.

[17] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[18] H. Vincent Poor,et al. UAV-Enabled Communication Using NOMA , 2018, IEEE Transactions on Communications.

[19] Mohamed Kassab,et al. LoRa technology MAC layer operations and Research issues , 2018, ANT/SEIT.

[20] Qingkai Liang,et al. Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning , 2018, ArXiv.

[21] Claire J. Tomlin,et al. Quadrotor Helicopter Flight Dynamics and Control: Theory and Experiment , 2007 .

[22] Shie Mannor,et al. A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[23] Derrick Wing Kwan Ng,et al. Optimal 3D-Trajectory Design and Resource Allocation for Solar-Powered UAV Communication Systems , 2018, IEEE Transactions on Communications.

[24] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..

[25] Qingqing Wu,et al. Joint Trajectory and Communication Design for Multi-UAV Enabled Wireless Networks , 2017, IEEE Transactions on Wireless Communications.

[26] Gabriel-Miro Muntean,et al. Ultra-Reliable IoT Communications with UAVs: A Swarm Use Case , 2018, IEEE Communications Magazine.

[27] Shalabh Bhatnagar. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..

[28] Xiao Liu,et al. Trajectory Design and Power Control for Multi-UAV Assisted Wireless Networks: A Machine Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[29] Jun-Bae Seo,et al. Nonorthogonal Random Access for 5G Mobile Communication Systems , 2018, IEEE Transactions on Vehicular Technology.

[30] Haitao Zhao,et al. Deployment Algorithms for UAV Airborne Networks Toward On-Demand Coverage , 2018, IEEE Journal on Selected Areas in Communications.

[31] Pieter Abbeel,et al. Constrained Policy Optimization , 2017, ICML.

[32] Jinho Choi,et al. A Game-Theoretic Approach for NOMA-ALOHA , 2018, 2018 European Conference on Networks and Communications (EuCNC).

[33] Fritz Wysotzki,et al. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[34] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[35] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37] Walid Saad,et al. Efficient Deployment of Multiple Unmanned Aerial Vehicles for Optimal Wireless Coverage , 2016, IEEE Communications Letters.

[38] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[39] E. Altman. Constrained Markov Decision Processes , 1999 .

[40] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[41] Joo-Seok Lee,et al. Optimal Path Planning of Solar-Powered UAV Using Gravitational Potential Energy , 2017, IEEE Transactions on Aerospace and Electronic Systems.

[42] U. Rieder,et al. Markov Decision Processes , 2010 .

[43] Nikolaos Papanikolopoulos,et al. Solar powered UAV: Design and experiments , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44] Michael C. Fu,et al. Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint , 2018, ArXiv.

[45] Shie Mannor,et al. Reward Constrained Policy Optimization , 2018, ICLR.

[46] Roland Siegwart,et al. Perpetual flight with a small solar-powered UAV: Flight results, performance analysis and model validation , 2016, 2016 IEEE Aerospace Conference.

[47] Mohamed-Slim Alouini,et al. Joint Trajectory and Precoding Optimization for UAV-Assisted NOMA Networks , 2019, IEEE Transactions on Communications.

[48] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[49] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50] Yu Cheng,et al. Optimizing Non-Orthogonal Multiple Access in Random Access Networks , 2020, 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring).

[51] Walid Saad,et al. Deep Reinforcement Learning for Minimizing Age-of-Information in UAV-Assisted Networks , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[52] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[53] Lin Xiao,et al. Throughput Maximization in Multi-UAV Enabled Communication Systems With Difference Consideration , 2018, IEEE Access.

[54] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[55] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.