Learning a Policy for Coordinated Sampling in Body Sensor Networks

This paper describes a method for learning coordination policies in body sensor networks. The learning of a compact coordination policy is important for implementing the policy in sensor nodes with limited memory. We present a novel algorithm, Reinforcement Learning Average Approximation (RLAA), to learn local coordination policies for each sensor node from globally joint rewards. These local policies are obtained by reinforcement learning and averaging state-action tables under a stochastic process model. We show results on a simulation of an existing body sensor network interfaced with transdermal sensors that demonstrate the performance of this learning scheme. Experimental results show that the performance of the RLAA algorithm is significantly better than a random policy and is close to the optimal policy that can be obtained from solving a global Markov Decision Process while the learning step is fast. The results also show that the RLAA algorithm is scalable to networks represented by large state spaces (in terms of number s of sensors and degree of discretization).

[1]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[2]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[3]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[4]  Davide Brunelli,et al.  Wireless Sensor Networks , 2012, Lecture Notes in Computer Science.

[5]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[6]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[7]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[8]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[9]  Brigham Anderson,et al.  Active learning for Hidden Markov Models: objective functions and algorithms , 2005, ICML.

[10]  Anand V. Panangadan,et al.  Poster abstract: MDP framework for sensor network coordination , 2009, 2009 International Conference on Information Processing in Sensor Networks.

[11]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[12]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[13]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[14]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[15]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[16]  R. Pidva,et al.  Clinical Evaluation of a Novel Interstitial Fluid Sensor System for Remote Continuous Alcohol Monitoring , 2008, IEEE Sensors Journal.

[17]  Shuping Liu Evaluation of a Markov Decision Process-based Coordinated Sampling Method , 2009 .