Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

We consider the problem of sensing a concealed or distant target by interrogation from multiple sensors situated on a single platform. The available actions that may be taken are selection of the next relative target-platform orientation and the next sensor to be deployed. The target is modeled in terms of a set of states, each state representing a contiguous set of targetsensor orientations over which the scattering physics is relatively stationary. The sequence of states sampled at multiple targetsensor orientations may be modeled as a Markov process. The sensor only has access to the scattered fields, without knowledge of the particular state being sampled, and therefore the problem is modeled as apartially observableMarkov decision process (POMDP). The POMDP yields a policy, in which the belief state at any point is mapped to a corresponding action. The nonmyopic policy is compared to an approximate myopic approach, with example results presented for measured underwater acoustic scattering data.

[1]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[4]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[5]  Frank P. Ferrie,et al.  Autonomous exploration: driven by uncertainty , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[7]  Lawrence Carin,et al.  Matching pursuits with a wave-based dictionary , 1997, IEEE Trans. Signal Process..

[8]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[9]  D. Castañón Approximate dynamic programming for sensor management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[10]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[11]  S. Mallat A wavelet tour of signal processing , 1998 .

[12]  Lawrence Carin,et al.  Multiaspect identification of submerged elastic targets via wave-based matching pursuits and hidden , 1999 .

[13]  Robin J. Evans,et al.  Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking , 2001, IEEE Trans. Signal Process..

[14]  Leonidas J. Guibas,et al.  Sensing, tracking and reasoning with relations , 2002, IEEE Signal Process. Mag..

[15]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[16]  Vikram Krishnamurthy,et al.  Algorithms for optimal scheduling and management of hidden Markov model sensors , 2002, IEEE Trans. Signal Process..

[17]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[18]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[19]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[20]  Lawrence Carin,et al.  Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Lawrence Carin,et al.  Application of the theory of optimal experiments to adaptive electromagnetic-induction sensing of buried targets , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Robin J. Evans,et al.  Networked sensor management and data rate control for tracking maneuvering targets , 2005, IEEE Transactions on Signal Processing.

[23]  D.A. Castanon,et al.  Stochastic Control Bounds on Sensor Network Performance , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[24]  Alfred O. Hero,et al.  Sensor management using an active sensing approach , 2005, Signal Process..