A learning theoretic approach to energy harvesting communication system optimization

A machine-to-machine (M2M) system composed of low-power embedded devices powered by energy scavenging mechanisms is considered. The data and energy arrival as well as the channel state processes are all modeled as finite-state Markov processes. Assuming that the state transition probabilities characterizing these processes are unknown at the transmitter, a learning theoretic approach is introduced, and it is shown that the transmitter is able to learn the optimal transmission policy that maximizes the expected sum of the data transmitted during the transmitter's lifetime. In addition to the learning theoretic approach, online and offline optimization problems are also studied for the same setup. By characterizing the optimal performance for all three problems we identify the loss due to lack of transmitter's information regarding the behaviors of the underlying processes. Numerical results corroborate theoretical findings and show that, for a given number of learning iterations, the learning theoretic approach reaches a 90% of the performance of the online optimization problem.

[1]  R. Bellman Dynamic programming. , 1957, Science.

[2]  Jing Yang,et al.  Transmission with Energy Harvesting Nodes in Fading Wireless Channels: Optimal Policies , 2011, IEEE Journal on Selected Areas in Communications.

[3]  C. E. Koksal,et al.  Near Optimal Power and Rate Control of Multi-Hop Sensor Networks With Energy Replenishment: Basic Limitations With Finite Energy and Data Storage , 2012, IEEE Transactions on Automatic Control.

[4]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[5]  Mehmet Akif Antepli,et al.  Optimal Packet Scheduling on an Energy Harvesting Broadcast Link , 2011, IEEE Journal on Selected Areas in Communications.

[6]  Xiaodong Wang,et al.  Communication of Energy Harvesting Tags , 2012, IEEE Transactions on Communications.

[7]  Biplab Sikdar,et al.  Relay Scheduling for Cooperative Communications in Sensor Networks with Energy Harvesting , 2011, IEEE Transactions on Wireless Communications.

[8]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[9]  Shuguang Cui,et al.  Throughput Maximization for the Gaussian Relay Channel with Energy Harvesting Constraints , 2011, IEEE Journal on Selected Areas in Communications.

[10]  Martin W. P. Savelsbergh,et al.  Integer-Programming Software Systems , 2005, Ann. Oper. Res..

[11]  Deniz Gündüz,et al.  Throughput maximization for an energy harvesting communication system with processing cost , 2012, 2012 IEEE Information Theory Workshop.

[12]  Jing Yang,et al.  Optimal Packet Scheduling in an Energy Harvesting Communication System , 2010, IEEE Transactions on Communications.

[13]  Prasanna Chaporkar,et al.  Optimal power allocation for a renewable energy source , 2011, 2012 National Conference on Communications (NCC).

[14]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[15]  Roy Chaoming Hsu,et al.  Reinforcement Learning-Based Dynamic Power Management for Energy Harvesting Wireless Sensor Network , 2009, IEA/AIE.

[16]  Deniz Gündüz,et al.  A general framework for the optimization of energy harvesting communication systems with battery imperfections , 2011, Journal of Communications and Networks.

[17]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[18]  R. Bellman A Markovian Decision Process , 1957 .

[19]  Mani B. Srivastava,et al.  Adaptive Duty Cycling for Energy Harvesting Systems , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[20]  Deniz Gündüz,et al.  Two-hop communication with energy harvesting , 2011, 2011 4th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[21]  Miquel Payaró,et al.  Optimal power allocation for a wireless multi-antenna energy harvesting node with arbitrary input distribution , 2012, 2012 IEEE International Conference on Communications (ICC).

[22]  Yishay Mansour,et al.  On the Complexity of Policy Iteration , 1999, UAI.

[23]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[24]  Mischa Dohler,et al.  Cognition and Docition in OFDMA-Based Femtocell Networks , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[25]  Roy D. Yates,et al.  A generic model for optimizing single-hop transmission policy of replenishable sensors , 2009, IEEE Transactions on Wireless Communications.

[26]  Andrew G. Barto,et al.  Adaptive Control of Duty Cycling in Energy-Harvesting Wireless Sensor Networks , 2007, 2007 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks.

[27]  Rui Zhang,et al.  Optimal Energy Allocation for Wireless Communications With Energy Harvesting Constraints , 2011, IEEE Transactions on Signal Processing.

[28]  J.M. Conrad,et al.  A survey of energy harvesting sources for embedded systems , 2008, IEEE SoutheastCon 2008.

[29]  P. Glorennec,et al.  Fuzzy Q-learning , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[30]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[31]  Aylin Yener,et al.  Sum-rate optimal power policies for energy harvesting transmitters in an interference channel , 2011, Journal of Communications and Networks.