Structured Threshold Policies for Dynamic Sensor Scheduling—A Partially Observed Markov Decision Process Approach

We consider the optimal sensor scheduling problem formulated as a partially observed Markov decision process (POMDP). Due to operational constraints, at each time instant, the scheduler can dynamically select one out of a finite number of sensors and record a noisy measurement of an underlying Markov chain. The aim is to compute the optimal measurement scheduling policy, so as to minimize a cost function comprising of estimation errors and measurement costs. The formulation results in a nonstandard POMDP that is nonlinear in the information state. We give sufficient conditions on the cost function, dynamics of the Markov chain and observation probabilities so that the optimal scheduling policy has a threshold structure with respect to a monotone likelihood ratio (MLR) ordering. As a result, the computational complexity of implementing the optimal scheduling policy is inexpensive. We then present stochastic approximation algorithms for estimating the best linear MLR order threshold policy.

[1]  Vikram Krishnamurthy,et al.  Algorithms for optimal scheduling and management of hidden Markov model sensors , 2002, IEEE Trans. Signal Process..

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Laurent Mevel,et al.  Exponential Forgetting and Geometric Ergodicity in Hidden Markov Models , 2000, Math. Control. Signals Syst..

[4]  C. SIAMJ. LYAPUNOV EXPONENTS FOR FINITE STATE NONLINEAR FILTERING , 1997 .

[5]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[6]  Ulrich Rieder,et al.  Structural results for partially observed control models , 1991, ZOR Methods Model. Oper. Res..

[7]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[8]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9]  K. Kastella,et al.  A Comparison of Task Driven and Information Driven Sensor Management for Target Tracking , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[10]  D. M. Topkis Supermodularity and Complementarity , 1998 .

[11]  David W. Lewis,et al.  Matrix theory , 1991 .

[12]  William S. Lovejoy,et al.  Some Monotonicity Results for Partially Observed Markov Decision Processes , 1987, Oper. Res..

[13]  A. Müller,et al.  Comparison Methods for Stochastic Models and Risks , 2002 .

[14]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15]  Robin J. Evans,et al.  Networked sensor management and data rate control for tracking maneuvering targets , 2005, IEEE Transactions on Signal Processing.

[16]  A Orman,et al.  Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[17]  C. N Bouza,et al.  Spall, J.C. Introduction to stochastic search and optimization. Estimation, simulation and control. Wiley Interscience Series in Discrete Mathematics and Optimization, 2003 , 2004 .

[18]  Gang George Yin,et al.  Spreading code optimization and adaptation in CDMA via discrete stochastic approximation , 2004, IEEE Transactions on Information Theory.

[19]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[20]  V. Krishnamurthy,et al.  Hierarchical resource management in adaptive airborne surveillance radars , 2006, IEEE Transactions on Aerospace and Electronic Systems.

[21]  Lang Tong,et al.  A Cross-Layer Approach to Cognitive MAC for Spectrum Agility , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..