Design and Comparison of Reward Functions in Reinforcement Learning for Energy Management of Sensor Nodes

Interest in remote monitoring has grown thanks to recent advancements in Internet-of-Things (IoT) paradigms. New applications have emerged, using small devices called sensor nodes capable of collecting data from the environment and processing it. However, more and more data are processed and transmitted with longer operational periods. At the same, the battery technologies have not improved fast enough to cope with these increasing needs. This makes the energy consumption issue increasingly challenging and thus, miniaturized energy harvesting devices have emerged to complement traditional energy sources. Nevertheless, the harvested energy fluctuates significantly during the node operation, increasing uncertainty in actually available energy resources. Recently, approaches in energy management have been developed, in particular using reinforcement learning approaches. However, in reinforcement learning, the algorithm’s performance relies greatly on the reward function. In this paper, we present two contributions. First, we explore five different reward functions (R1–R5) to identify the most suitable variables to use in such functions to obtain the desired behaviour. Experiments were conducted using the Q-learning algorithm to adjust the energy consumption depending on the energy harvested. Results with the five reward functions illustrate how the choice thereof impacts the energy consumption of the node. Secondly, we propose two additional reward functions (R6 and R7) able to find the compromise between energy consumption and a node performance using a non-fixed balancing parameter. Our simulation results show that the proposed reward functions (R6 and R7) adjust the node’s performance depending on the battery level and reduce the learning time.

[1]  Evan Dekker,et al.  Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[2]  Hansan Liu,et al.  Lithium-ion batteries : advanced materials and technologies , 2016 .

[3]  Liang Xiao,et al.  Reinforcement Learning-Based Sensor Access Control for WBANs , 2019, IEEE Access.

[4]  Bernhard Rinner,et al.  Energy-aware task scheduling in wireless sensor networks based on cooperative reinforcement learning , 2014, 2014 IEEE International Conference on Communications Workshops (ICC).

[5]  Hiroshi Nakamura,et al.  Adaptive Power Management in Solar Energy Harvesting Sensor Node Using Reinforcement Learning , 2017, ACM Trans. Embed. Comput. Syst..

[6]  Srini Narayanan,et al.  Learning all optimal policies with multiple criteria , 2008, ICML '08.

[7]  Qingwei Chen,et al.  Multi-objective reinforcement learning algorithm for MOSDMP in unknown environment , 2010, 2010 8th World Congress on Intelligent Control and Automation.

[8]  Yannick Le Moullec,et al.  A QoS Optimization Approach in Cognitive Body Area Networks for Healthcare Applications , 2017, Sensors.

[9]  Mark Humphreys,et al.  Action selection methods using reinforcement learning , 1997 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Joel J. P. C. Rodrigues,et al.  QoS-Aware Energy Management in Body Sensor Nodes Powered by Human Energy Harvesting , 2016, IEEE Sensors Journal.

[12]  Jean-Philippe Diguet,et al.  Reward Function Evaluation in a Reinforcement Learning Approach for Energy Management , 2018, 2018 16th Biennial Baltic Electronics Conference (BEC).

[13]  Shie Mannor,et al.  A Geometric Approach to Multi-Criterion Reinforcement Learning , 2004, J. Mach. Learn. Res..

[14]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15]  Wei Li,et al.  Harvesting Ambient Environmental Energy for Wireless Sensor Networks: A Survey , 2014, J. Sensors.

[16]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[17]  Jonas Karlsson,et al.  Learning to Solve Multiple Goals , 1997 .

[18]  Manos M. Tentzeris,et al.  Ambient RF Energy-Harvesting Technologies for Self-Sustainable Standalone Wireless Sensor Platforms , 2014, Proceedings of the IEEE.

[19]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[20]  Gil Zussman,et al.  Movers and Shakers: Kinetic Energy Harvesting for the Internet of Things , 2013, IEEE Journal on Selected Areas in Communications.

[21]  Eryk Dutkiewicz,et al.  Dynamic power control in Wireless Body Area Networks using reinforcement learning with approximation , 2011, 2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications.

[22]  Luca Benini,et al.  Context aware power management for motion-sensing body area network nodes , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Tommaso Melodia,et al.  Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey , 2019, Ad Hoc Networks.

[24]  Reinaldo A. C. Bianchi,et al.  Heuristically Accelerated Q-Learning: A New Approach to Speed Up Reinforcement Learning , 2004, SBIA.