Reinforcement Learning for Energy Harvesting Decode-and-Forward Two-Hop Communications

Energy harvesting (EH) two-hop communications are considered. The transmitter and the relay harvest energy from the environment and use it exclusively for transmitting data. A data arrival process is assumed at the transmitter. At the relay, a finite data buffer is used to store the received data. We consider a realistic scenario in which the EH nodes have only local causal knowledge, i.e., at any time instant, each EH node only knows the current value of its EH process, channel state, and data arrival process. Our goal is to find a power allocation policy to maximize the throughput at the receiver. We show that because the EH nodes have local causal knowledge, the two-hop communication problem can be separated into two point-to-point problems. Consequently, independent power allocation problems are solved at each EH node. To find the power allocation policy, reinforcement learning with linear function approximation is applied. Moreover, to perform function approximation two feature functions which consider the data arrival process are introduced. Numerical results show that the proposed approach has only a small degradation as compared to the offline optimum case. Furthermore, we show that with the use of the proposed feature functions a better performance is achieved compared to standard approximation techniques.

[1]  Deniz Gündüz,et al.  A Learning Theoretic Approach to Energy Harvesting Communication System Optimization , 2012, IEEE Transactions on Wireless Communications.

[2]  Elza Erkip,et al.  Energy Harvesting Two-Hop Communication Networks , 2015, IEEE Journal on Selected Areas in Communications.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Yuan Liu,et al.  Wireless Information and Power Transfer for Multirelay-Assisted Cooperative Communication , 2016, IEEE Communications Letters.

[5]  Lan Tang,et al.  Joint Data and Energy Transmission in a Two-Hop Network With Multiple Relays , 2014, IEEE Communications Letters.

[6]  Geoffrey J. Gordon Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.

[7]  Aylin Yener,et al.  Two-hop networks with energy harvesting: The (non-)impact of buffer size , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[8]  Alborz Geramifard,et al.  A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning , 2013, Found. Trends Mach. Learn..

[9]  Sean P. Meyn,et al.  An analysis of reinforcement learning with function approximation , 2008, ICML '08.

[10]  Barbara M. Masini,et al.  Analysis of cooperative systems with wireless power transfer and randomly located relays , 2015, 2015 IEEE International Conference on Communication Workshop (ICCW).

[11]  Deniz Gündüz,et al.  Designing intelligent energy harvesting communication systems , 2014, IEEE Communications Magazine.

[12]  Aylin Yener,et al.  Optimum Transmission Policies for Battery Limited Energy Harvesting Nodes , 2010, IEEE Transactions on Wireless Communications.

[13]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[14]  Roy D. Yates,et al.  A generic model for optimizing single-hop transmission policy of replenishable sensors , 2009, IEEE Transactions on Wireless Communications.

[15]  Anja Klein,et al.  Reinforcement learning for energy harvesting point-to-point communications , 2016, 2016 IEEE International Conference on Communications (ICC).

[16]  Deniz Gündüz,et al.  Two-hop communication with energy harvesting , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[17]  Ranjan K. Mallik,et al.  Power Allocation in Energy Harvesting Relay Systems , 2012, 2012 IEEE 75th Vehicular Technology Conference (VTC Spring).

[18]  Kaibin Huang,et al.  Energy Harvesting Wireless Communications: A Review of Recent Advances , 2015, IEEE Journal on Selected Areas in Communications.