Learning to survive: Achieving energy neutrality in wireless sensor networks using reinforcement learning

Energy harvesting is a promising approach to enable autonomous long-life wireless sensor networks. As typical energy sources present time-varying behavior, each node embeds an energy manager, which dynamically adapts the power consumption of the node to maximize the quality of service, while preventing power failure. In this work, RLMan, a novel energy management scheme based on reinforcement learning theory, is proposed. RLMan dynamically adapts its policy to time-varying environment by continuously exploring, while exploiting the current knowledge to improve the quality of service. The proposed energy management scheme has a very low memory footprint, and requires very few computational power, which makes it suitable for online execution on sensor nodes. Moreover, it only necessitates the state of charge of the energy storage device as an input, and therefore is practical to implement. RLMan was compared to three state-of-the-art energy management schemes, using simulations and energy traces from real measurements. Results show that using RLMan can enable almost 70 % gains regarding the average throughput.

[1]  Gil Zussman,et al.  Networking Low-Power Energy Harvesting Devices: Measurements and Algorithms , 2011, IEEE Transactions on Mobile Computing.

[2]  Olivier Berder,et al.  PowWow : Power Optimized Hardware/Software Framework for Wireless Motes , 2010, ARCS Workshops.

[3]  Mani B. Srivastava,et al.  Power management in energy harvesting sensor networks , 2007, TECS.

[4]  Robert Babuska,et al.  A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Olivier Berder,et al.  Energy-Efficient Power Manager and MAC Protocol for Multi-Hop Wireless Sensor Networks Powered by Periodic Energy Harvesting Sources , 2015, IEEE Sensors Journal.

[6]  Hao-Li Wang,et al.  A Reinforcement Learning-Based ToD Provisioning Dynamic Power Management for Sustainable Operation of Energy Harvesting Wireless Sensor Node , 2014, IEEE Transactions on Emerging Topics in Computing.

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  Andrew G. Barto,et al.  Adaptive Control of Duty Cycling in Energy-Harvesting Wireless Sensor Networks , 2007, 2007 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks.

[9]  Michele Magno,et al.  Kinetic energy harvesting: Toward autonomous wearable sensing for Internet of Things , 2016, 2016 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM).

[10]  Cécile Belleudy,et al.  A framework for modeling and simulating energy harvesting WSN nodes with efficient power management policies , 2012, EURASIP J. Embed. Syst..

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Olivier Berder,et al.  GRAPMAN: Gradual power manager for consistent throughput of energy harvesting wireless sensor nodes , 2015, 2015 IEEE 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[13]  Olivier Berder,et al.  Fuzzy power management for energy harvesting Wireless Sensor Nodes , 2016, 2016 IEEE International Conference on Communications (ICC).

[14]  S. Peng,et al.  Prediction free energy neutral power management for energy harvesting wireless sensor nodes , 2014, Ad Hoc Networks.

[15]  Luca P. Carloni,et al.  Energy-Harvesting Active Networked Tags (EnHANTs) , 2015, ACM Trans. Sens. Networks.

[16]  Hado van Hasselt,et al.  Reinforcement Learning in Continuous State and Action Spaces , 2012, Reinforcement Learning.