Surveillance in an abruptly changing world via multiarmed bandits

We study a path planning problem in an environment that is abruptly changing due to the arrival of unknown spatial events. The objective of the path planning problem is to collect the data that is most evidential about the events. We formulate this problem as a multiarmed bandit (MAB) problem with Gaussian rewards and change points, and address the fundamental tradeoff between learning the true event (exploration), and collecting the data that is most evidential about the true event (exploitation). We extend the switching-window UCB algorithm for MAB problems with bounded rewards and change points to the context of correlated Gaussian rewards and develop the switching-window UCL (SW-UCL) algorithm. We extend the SW-UCL algorithm to an adaptive SW-UCL algorithm that utilizes statistical change detection to adapt the SW-UCL algorithm. We also develop a block SW-UCL algorithm that reduces the number of transitions among arms in the SW-UCL algorithm, and is more amenable to robotic applications.

[1]  Vaibhav Srivastava,et al.  On optimal foraging and multi-armed bandits , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[2]  Geoffrey A. Hollinger,et al.  Underwater Data Collection Using Robotic Sensor Networks , 2012, IEEE Journal on Selected Areas in Communications.

[3]  Francesco Bullo,et al.  Stochastic surveillance strategies for spatial quickest detection , 2011, IEEE Conference on Decision and Control and European Control Conference.

[4]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  A. Czirók,et al.  Collective Motion , 1999, physics/9902023.

[7]  Naomi Ehrich Leonard,et al.  Collective Motion, Sensor Networks, and Ocean Sampling , 2007, Proceedings of the IEEE.

[8]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[9]  Naomi Ehrich Leonard,et al.  Coordinated control of an underwater glider fleet in an adaptive ocean sampling field experiment in Monterey Bay , 2010, J. Field Robotics.

[10]  Munther A. Dahleh,et al.  Multi-Agent Task Assignment in the Bandit Framework , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[11]  Derek A. Paley,et al.  Multivehicle coverage control for a nonstationary spatiotemporal field , 2014, Autom..

[12]  S. Panchapakesan,et al.  Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .

[13]  Andreas Krause,et al.  Efficient Informative Sensing using Multiple Robots , 2014, J. Artif. Intell. Res..

[14]  Mac Schwager,et al.  Generating informative paths for persistent sensing in unknown environments , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[16]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[17]  Mac Schwager,et al.  Persistent Robotic Tasks: Monitoring and Sweeping in Changing Environments , 2011, IEEE Transactions on Robotics.

[18]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[19]  Jorge Cortés,et al.  Adaptive Information Collection by Robotic Sensor Networks for Spatial Estimation , 2012, IEEE Transactions on Automatic Control.

[20]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..