Adaptive Sampling using POMDPs with Domain-Specific Considerations

We investigate improving Monte Carlo Tree Search based solvers for Partially Observable Markov Decision Processes (POMDPs), when applied to adaptive sampling problems. We propose improvements in rollout allocation, the action exploration algorithm, and plan commitment. The first allocates a different number of rollouts depending on how many actions the agent has taken in an episode. We find that rollouts are more valuable after some initial information is gained about the environment. Thus, a linear increase in the number of rollouts, i.e. allocating a fixed number at each step, is not appropriate for adaptive sampling tasks. The second alters which actions the agent chooses to explore when building the planning tree. We find that by using knowledge of the number of rollouts allocated, the agent can more effectively choose actions to explore. The third improvement is in determining how many actions the agent should take from one plan. Typically, an agent will plan to take the first action from the planning tree and then call the planner again from the new state. Using statistical techniques, we show that it is possible to greatly reduce the number of rollouts by increasing the number of actions taken from a single planning tree without affecting the agent’s final reward. Finally, we demonstrate experimentally, on simulated and real aquatic data from an underwater robot, that these improvements can be combined, leading to better adaptive sampling. The code for this work is available at https://github.com/uscresl/AdaptiveSamplingPOMCP.

[1]  Geoffrey A. Hollinger,et al.  Sampling-based robotic information gathering algorithms , 2014, Int. J. Robotics Res..

[2]  Marc Toussaint,et al.  The Bayesian Search Game , 2014, Theory and Principled Methods for the Design of Metaheuristics.

[3]  Andreas Krause,et al.  Submodularity and its applications in optimized information gathering , 2011, TIST.

[4]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[5]  Gaurav S. Sukhatme,et al.  Towards marine bloom trajectory prediction for AUV mission planning , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  Chen Yu,et al.  Underwater chemical plume tracing based on partially observable Markov decision process , 2019, International Journal of Advanced Robotic Systems.

[7]  Gaurav S. Sukhatme,et al.  Pilot Surveys for Adaptive Informative Sampling , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Timothy Patten,et al.  Dec-MCTS: Decentralized planning for multi-robot active perception , 2019, Int. J. Robotics Res..

[10]  Gaurav S. Sukhatme,et al.  Branch and bound for informative path planning , 2012, 2012 IEEE International Conference on Robotics and Automation.

[11]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[12]  Andreas Krause,et al.  Near-optimal sensor placements in Gaussian processes , 2005, ICML.

[13]  Djallel Bouneffouf,et al.  Survey on Applications of Multi-Armed and Contextual Bandits , 2020, 2020 IEEE Congress on Evolutionary Computation (CEC).

[14]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[15]  Shuangshuang Fan,et al.  AUV Adaptive Sampling Methods: A Review , 2019, Applied Sciences.

[16]  Xubo Yue,et al.  Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout , 2019, AISTATS.

[17]  J. Gurland,et al.  A Simple Approximation for Unbiased Estimation of the Standard Deviation , 1971 .

[18]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[19]  Alessandro Lazaric,et al.  Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence , 2012, NIPS.

[20]  Scott Sanner,et al.  Sequential Bayesian Optimisation for Spatial-Temporal Monitoring , 2014, UAI.

[21]  Pierre F. J. Lermusiaux,et al.  Science of Autonomy: Time-Optimal Path Planning and Adaptive Sampling for Swarms of Ocean Vehicles , 2016 .

[22]  Amanda Bouman,et al.  PLGRIM: Hierarchical Value Learning for Large-scale Exploration in Unknown Environments , 2021, ICAPS.

[23]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.