Maximizing Quality of Information From Multiple Sensor Devices: The Exploration vs Exploitation Tradeoff

This paper investigates Quality of Information (QoI) aware adaptive sampling in a system where two sensor devices report information to an end user. The system carries out a sequence of tasks, where each task relates to a random event that must be observed. The accumulated information obtained from the sensor devices is reported once per task to a higher layer application at the end user. The utility of each report depends on the timeliness of the report and also on the quality of the observations. Quality can be improved by accumulating more observations for the same task, at the expense of delay. We assume new tasks arrive randomly, and the qualities of each new observation are also random. The goal is to maximize time average quality of information subject to cost constraints. We solve the problem by leveraging dynamic programming and Lyapunov optimization. Our algorithms involve solving a 2-dimensional optimal stopping problem, and result in a 2-dimensional threshold rule. When task arrivals are i.i.d., the optimal solution to the stopping problem can be closely approximated with a small number of simplified value iterations. When task arrivals are periodic, we derive a structured form approximately optimal stopping policy. We also introduce hybrid policies applied over the proposed adaptive sampling algorithms to further improve the performance. Numerical results demonstrate that our policies perform near optimal. Overall, this work provides new insights into network operation based on QoI attributes.

[1]  Enrique Ferreira,et al.  Multi agent collaboration using distributed value functions , 2000, Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No.00TH8511).

[2]  Aylin Yener,et al.  Quality of Information aware scheduling in task processing networks , 2011, 2011 International Symposium of Modeling and Optimization of Mobile, Ad Hoc, and Wireless Networks.

[3]  Zhu Han,et al.  Distributed Cognitive Sensing for Time Varying Channels: Exploration and Exploitation , 2010, 2010 IEEE Wireless Communication and Networking Conference.

[4]  Mani Srivastava,et al.  Toward Quality of Information Aware Rate Control for Sensor Networks , 2009 .

[5]  Leandros Tassiulas,et al.  Resource Allocation and Cross-Layer Control in Wireless Networks , 2006, Found. Trends Netw..

[6]  Aylin Yener,et al.  Quality-of-information aware transmission policies with time-varying links , 2011, 2011 - MILCOM 2011 Military Communications Conference.

[7]  Michael J. Neely,et al.  Dynamic optimization and learning for renewal systems , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[8]  Lui Sha,et al.  Real-time communication and coordination in embedded sensor networks , 2003, Proc. IEEE.

[9]  Ramesh Govindan,et al.  Operational information content sum capacity: Formulation and examples , 2011, 14th International Conference on Information Fusion.

[10]  Milind Tambe,et al.  Distributed Sensor Networks , 2003, Multiagent Systems, Artificial Societies, and Simulated Organizations.

[11]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[12]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[13]  Andrew W. Moore,et al.  Distributed Value Functions , 1999, ICML.

[14]  C. SIAMJ. A NEW VALUE ITERATION METHOD FOR THE AVERAGE COST DYNAMIC PROGRAMMING PROBLEM∗ , 1995 .

[15]  Milind Tambe,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[16]  Srikanth V. Krishnamurthy,et al.  Quality-of-information aware networking for tactical military networks , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[17]  Michael J. Neely,et al.  Energy optimal control for time-varying wireless networks , 2005, IEEE Transactions on Information Theory.

[18]  Mohan Kumar,et al.  Distributed Independent Reinforcement Learning (DIRL) Approach to Resource Management in Wireless Sensor Networks , 2007, 2007 IEEE Internatonal Conference on Mobile Adhoc and Sensor Systems.

[19]  J. Rawlings,et al.  The stability of constrained receding horizon control , 1993, IEEE Trans. Autom. Control..

[20]  Michael C. Fu,et al.  An Adaptive Sampling Algorithm for Solving Markov Decision Processes , 2005, Oper. Res..

[21]  Quentin F. Stout,et al.  Optimal few-stage designs , 2002 .

[22]  Zhenzhen Liu,et al.  RL-MAC: A QoS-Aware Reinforcement Learning based MAC Protocol for Wireless Sensor Networks , 2006, 2006 IEEE International Conference on Networking, Sensing and Control.

[23]  Sadaf Zahedi,et al.  A framework for QoI-inspired analysis for sensor network deployment planning , 2007, WICON '07.

[24]  Ramesh Govindan,et al.  Optimizing information credibility in social swarming applications , 2010, 2011 Proceedings IEEE INFOCOM.

[25]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.