Deep Reinforcement Learning Optimal Transmission Policy for Communication Systems With Energy Harvesting and Adaptive MQAM

In this paper, we study an optimal transmission problem in a point-to-point wireless communication system with energy harvesting and limited battery at its transmitter. Considering the non-availability of prior information about distribution on energy arrival process and channel coefficient, we propose a deep reinforcement learning (DRL) based optimal policy to allocate transmission power and adaptively adjust multi-ary modulation level according to the obtained causal information on harvested energy, battery state, and channel gain to achieve maximum throughput of the system. This optimization problem is formulated as a Markov decision process with unknown state transition probability. Applying the principle of the DRL, we use a deep Q-network to find the optimal solution in continuous state space, which provides rapid convergence since there is no additional memory required. Simulation results show that the proposed policy is effective and valid and it can improve the throughput of the system compared with Q-learning, greedy, random, and constant modulation level transmission policies.

[1]  Rui Ma,et al.  Adaptive MQAM for Energy Harvesting Wireless Communications With 1-Bit Channel Feedback , 2015, IEEE Transactions on Wireless Communications.

[2]  Xiaodong Wang,et al.  Power Allocation for Energy Harvesting Transmitter With Causal Information , 2014, IEEE Transactions on Communications.

[3]  Tobias Weber,et al.  Reinforcement Learning for Energy Harvesting Decode-and-Forward Two-Hop Communications , 2017, IEEE Transactions on Green Communications and Networking.

[4]  Nei Kato,et al.  A Novel Non-Supervised Deep-Learning-Based Network Traffic Control Method for Software Defined Wireless Networks , 2018, IEEE Wireless Communications.

[5]  Deniz Gündüz,et al.  A general framework for the optimization of energy harvesting communication systems with battery imperfections , 2011, Journal of Communications and Networks.

[6]  Deniz Gündüz,et al.  A Learning Theoretic Approach to Energy Harvesting Communication System Optimization , 2012, IEEE Transactions on Wireless Communications.

[7]  Jing Yang,et al.  Optimal Packet Scheduling in an Energy Harvesting Communication System , 2010, IEEE Transactions on Communications.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Abbas Mehrabi,et al.  Maximizing Data Collection Throughput on a Path in Energy Harvesting Sensor Networks Using a Mobile Sink , 2016, IEEE Transactions on Mobile Computing.

[10]  Qing Bai,et al.  Average throughput maximization for energy harvesting transmitters with causal energy arrival information , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[11]  Nei Kato,et al.  Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning , 2017, IEEE Transactions on Computers.

[12]  Zhi Chen,et al.  Intelligent Power Control for Spectrum Sharing in Cognitive Radios: A Deep Reinforcement Learning Approach , 2017, IEEE Access.

[13]  K. J. Ray Liu,et al.  On Outage Probability for Two-Way Relay Networks With Stochastic Energy Harvesting , 2016, IEEE Transactions on Communications.

[14]  Fan Zhang,et al.  A Kind of Joint Routing and Resource Allocation Scheme Based on Prioritized Memories-Deep Q Network for Cognitive Radio Ad Hoc Networks , 2018, Sensors.

[15]  Raviraj S. Adve,et al.  Energy Harvesting Cooperative Communication Systems , 2014, IEEE Transactions on Wireless Communications.

[16]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[17]  Jia Yuan Yu,et al.  A Reinforcement Learning Technique for Optimizing Downlink Scheduling in an Energy-Limited Vehicular Network , 2017, IEEE Transactions on Vehicular Technology.

[18]  Nei Kato,et al.  State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems , 2017, IEEE Communications Surveys & Tutorials.

[19]  Nei Kato,et al.  A Markovian Analysis for Explicit Probabilistic Stopping-Based Information Propagation in Postdisaster Ad Hoc Mobile Networks , 2016, IEEE Transactions on Wireless Communications.

[20]  Neelesh B. Mehta,et al.  Power and Discrete Rate Adaptation for Energy Harvesting Wireless Nodes , 2011, 2011 IEEE International Conference on Communications (ICC).

[21]  Maurice J. Khabbaz,et al.  Scheduling the Operation of a Connected Vehicular Network Using Deep Reinforcement Learning , 2019, IEEE Transactions on Intelligent Transportation Systems.

[22]  Jing Yang,et al.  Transmission with Energy Harvesting Nodes in Fading Wireless Channels: Optimal Policies , 2011, IEEE Journal on Selected Areas in Communications.

[23]  Neelesh B. Mehta,et al.  Discrete-Rate Adaptation and Selection in Energy Harvesting Wireless Systems , 2015, IEEE Transactions on Wireless Communications.

[24]  Elza Erkip,et al.  Energy Harvesting Two-Hop Communication Networks , 2015, IEEE Journal on Selected Areas in Communications.

[25]  Salman Durrani,et al.  SWIPT with practical modulation and RF energy harvesting sensitivity , 2016, 2016 IEEE International Conference on Communications (ICC).

[26]  Pan Li,et al.  Online Power Control for 5G Wireless Communications: A Deep Q-Network Approach , 2018, 2018 IEEE International Conference on Communications (ICC).

[27]  Dong In Kim,et al.  Probability of Packet Loss in Energy Harvesting Nodes With Cognitive Radio Capabilities , 2016, IEEE Communications Letters.

[28]  Mohamed-Slim Alouini,et al.  Performance Limits of Online Energy Harvesting Communications With Noisy Channel State Information at the Transmitter , 2017, IEEE Access.

[29]  Hai Jiang,et al.  Optimal transmission policy in energy harvesting wireless communications: A learning approach , 2017, 2017 IEEE International Conference on Communications (ICC).

[30]  K. J. Ray Liu,et al.  Data-Driven Stochastic Models and Policies for Energy Harvesting Sensor Communications , 2014, IEEE Journal on Selected Areas in Communications.

[31]  Anja Klein,et al.  Reinforcement learning for energy harvesting point-to-point communications , 2016, 2016 IEEE International Conference on Communications (ICC).

[32]  A. Goldsmith,et al.  Variable-rate variable-power MQAM for fading channels , 1996, Proceedings of Vehicular Technology Conference - VTC.

[33]  Nei Kato,et al.  The Deep Learning Vision for Heterogeneous Network Traffic Control: Proposal, Challenges, and Future Perspective , 2017, IEEE Wireless Communications.

[34]  Rui Zhang,et al.  Full-duplex cooperative cognitive radio networks with wireless energy harvesting , 2017, 2017 IEEE International Conference on Communications (ICC).

[35]  Wei Liang,et al.  End-to-End Throughput Maximization for Underlay Multi-Hop Cognitive Radio Networks With RF Energy Harvesting , 2017, IEEE Transactions on Wireless Communications.

[36]  Mehdi Dehghan,et al.  Distributed Power Control for Delay Optimization in Energy Harvesting Cooperative Relay Networks , 2017, IEEE Transactions on Vehicular Technology.

[37]  Nei Kato,et al.  On the Outage Probability of Device-to-Device-Communication-Enabled Multichannel Cellular Networks: An RSS-Threshold-Based Perspective , 2016, IEEE Journal on Selected Areas in Communications.

[38]  Wei Zhang,et al.  Optimal power allocations for multichannel energy harvesting cognitive radio , 2017, 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM).

[39]  Zhigang Chen,et al.  Energy-Harvesting-Aided Spectrum Sensing and Data Transmission in Heterogeneous Cognitive Radio Sensor Network , 2016, IEEE Transactions on Vehicular Technology.