Sensor scheduling for hunting elusive hiding targets via whittle's restless bandit index policy

We consider a sensor scheduling model where a set of identical sensors are used to hunt a larger set of heterogeneous targets, each of which is located at a corresponding site. Target states change randomly over discrete time slots between “exposed” and “hidden,” according to Markovian transition probabilities that depend on whether sites are searched or not, so as to make the targets elusive. Sensors are imperfect, failing to detect an exposed target when searching its site with a positive misdetection probability. We formulate as a partially observable Markov decision process the problem of scheduling the sensors to search the sites so as to maximize the expected total discounted value of rewards earned (when targets are hunted) minus search costs incurred. Given the intractability of finding an optimal policy, we introduce a tractable heuristic search policy of priority-index type based on the Whittle index for restless bandits. Preliminary computational results are reported showing that such a policy is nearly optimal and can substantially outperform the myopic policy and other simple heuristics.

[1]  Barbara F. La Scala,et al.  Optimal target tracking with restless bandits , 2006, Digit. Signal Process..

[2]  Bin Liu,et al.  Blending Sensor Scheduling Strategy with Particle Filter to Track a Smart Target , 2009, Wirel. Sens. Netw..

[3]  Barbara F. La Scala,et al.  Sensor management for tracking smart targets , 2009, Digit. Signal Process..

[4]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[5]  Jeffrey E Rucker Using Agent-Based Modeling to Search for Elusive Hiding Targets , 2012 .

[6]  Robert B. Washburn,et al.  Application of Multi-Armed Bandits to Sensor Management , 2008 .

[7]  José Niño-Mora,et al.  Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[8]  J. Nio-Mora An Index Policy for Dynamic Fading-Channel Allocation to Heterogeneous Mobile Users with Partial Observations , 2008, 2008 Next Generation Internet Networks.

[9]  José Niño-Mora,et al.  Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.

[10]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[11]  José Niño-Mora,et al.  Dynamic allocation indices for restless projects and queueing admission control: a polyhedral approach , 2002, Math. Program..

[12]  William Moran,et al.  Application of Sensor Scheduling Concepts to Radar , 2008 .

[13]  Alfred O. Hero,et al.  Adaptive multi-modality sensor scheduling for detection and tracking of smart targets , 2006, Digit. Signal Process..

[14]  José Niño-Mora A Restless Bandit Marginal Productivity Index for Opportunistic Spectrum Access with Sensing Errors , 2009, NET-COOP.

[15]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[16]  E. Feron,et al.  Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.