Actor-Critic Deep Reinforcement Learning for Energy Minimization in UAV-Aided Networks

In this paper, we investigate a user-timeslot scheduling problem for downlink unmanned aerial vehicle (UAV)-aided networks, where the UAV serves as an aerial base station. We formulate an optimization problem by jointly determining user scheduling and hovering time to minimize UAV’s transmission and hovering energy. An offline algorithm is proposed to solve the problem based on the branch and bound method and the golden section search. However, executing the offline algorithm suffers from the exponential growth of computational time. Therefore, we apply a deep reinforcement learning (DRL) method to design an online algorithm with less computational time. To this end, we first reformulate the original user scheduling problem to a Markov decision process (MDP). Then, an actor-critic-based RL algorithm is developed to determine the scheduling policy under the guidance of two deep neural networks. Numerical results show the proposed online algorithm obtains a good tradeoff between performance gain and computational time.

[1]  Chi Harold Liu,et al.  Energy-Efficient UAV Control for Effective and Fair Communication Coverage: A Deep Reinforcement Learning Approach , 2018, IEEE Journal on Selected Areas in Communications.

[2]  Derrick Wing Kwan Ng,et al.  Energy-Efficient Resource Allocation for Secure UAV Communication Systems , 2022 .

[3]  Victor C. M. Leung,et al.  Deep-Reinforcement-Learning-Based Optimization for Cache-Enabled Opportunistic Interference Alignment Wireless Networks , 2017, IEEE Transactions on Vehicular Technology.

[4]  Weixiong Zhang Branch-and-Bound Search Algorithms and Their Computational Complexity. , 1996 .

[5]  Ingrid Oliveros,et al.  Search for Global Maxima in Multimodal Functions by Applying Numerical Optimization Algorithms: A Comparison between Golden Section and Simulated Annealing , 2019, Comput..

[6]  Jie Xu,et al.  Energy Minimization for Wireless Communication With Rotary-Wing UAV , 2018, IEEE Transactions on Wireless Communications.

[7]  Andrea J. Goldsmith,et al.  On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming , 2006, IEEE Journal on Selected Areas in Communications.

[8]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[9]  Rui Zhang,et al.  Wireless communications with unmanned aerial vehicles: opportunities and challenges , 2016, IEEE Communications Magazine.

[10]  Walid Saad,et al.  Cellular-Connected UAVs over 5G: Deep Reinforcement Learning for Interference Management , 2018, ArXiv.

[11]  Walid Saad,et al.  A Tutorial on UAVs for Wireless Networks: Applications, Challenges, and Open Problems , 2018, IEEE Communications Surveys & Tutorials.

[12]  Rui Zhang,et al.  3D Trajectory Optimization in Rician Fading for UAV-Enabled Data Harvesting , 2019, IEEE Transactions on Wireless Communications.

[13]  Mostafa Zaman Chowdhury,et al.  Energy-Efficient UAV-to-User Scheduling to Maximize Throughput in Wireless Networks , 2020, IEEE Access.

[14]  Wei Xu,et al.  Robust MMSE Beamforming for Multiuser MISO Systems With Limited Feedback , 2009, IEEE Signal Processing Letters.