Joint EH Time and Transmit Power Optimization Based on DDPG for EH Communications

Energy management and power allocation policy is considered for energy harvesting (EH) communications. In this letter, we propose a joint optimization problem with the continuous EH time and transmit power to maximize the long-term throughput based on deep deterministic policy gradient (DDPG). However, the joint optimization problem leads to a large continuous action space. In order to reduce the dimension of action space, we present a deep reinforcement learning (DRL) framework by combining DDPG and convex program. The original problem is decomposed into two-layer optimization subproblems by using the primal decomposition method. The primary problem can be solved by DDPG with a low-dimensional action space. The lower-layer subproblem can be solved by using the existing convex toolbox. Numerical simulation results show that, compared with the existing energy management or power allocation policies for EH communications, the proposed DRL framework can achieve higher long-term throughput.

[1]  Yan Chen,et al.  Deep Deterministic Policy Gradient (DDPG)-Based Energy Harvesting Wireless Communications , 2019, IEEE Internet of Things Journal.

[2]  Zhu Han,et al.  Ambient Backscatter Assisted Wireless Powered Communications , 2018, IEEE Wireless Communications.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Giuseppe Aceto,et al.  MIMETIC: Mobile encrypted traffic classification using multimodal deep learning , 2019, Comput. Networks.

[5]  Daniel Pérez Palomar,et al.  A tutorial on decomposition methods for network utility maximization , 2006, IEEE Journal on Selected Areas in Communications.

[6]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[7]  Giacinto Gelli,et al.  Decision Fusion Rules in Ambient Backscatter Wireless Sensor Networks , 2019, 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[8]  Anja Klein,et al.  Reinforcement learning for energy harvesting point-to-point communications , 2016, 2016 IEEE International Conference on Communications (ICC).

[9]  Deniz Gündüz,et al.  A Learning Theoretic Approach to Energy Harvesting Communication System Optimization , 2012, IEEE Transactions on Wireless Communications.

[10]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.