论文信息 - Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning

Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning

Online planning methods for partially observable Markov decision processes (POMDPs) have recently gained much interest. In this paper, we propose the introduction of prior knowledge in the form of (probabilistic) relationships among discrete state-variables, for online planning based on the well-known POMCP algorithm. In particular, we propose the use of hard constraint networks and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rocksample show that the usage of this knowledge provides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed.

[1] Kee-Eung Kim,et al. Monte-Carlo Tree Search for Constrained POMDPs , 2018, NeurIPS.

[2] Yangsheng Xu,et al. Energy management for four-wheel independent driving vehicle , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[4] Daniele Nardi,et al. Cooperative situation assessment in a maritime scenario , 2012, Int. J. Intell. Syst..

[5] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.

[6] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.

[7] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[8] Raul G. Longoria,et al. Characterization of Load Uncertainty in Unstructured Terrains and Applications to Battery Remaining Run‐time Prediction , 2013, J. Field Robotics.

[9] Frans A. Oliehoek,et al. Scalable Planning and Learning for Multiagent POMDPs , 2014, AAAI.

[10] David Hsu,et al. DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[11] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[12] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.

[13] Leslie Pack Kaelbling,et al. Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[14] Cédric Pradalier,et al. A Spatially and Temporally Scalable Approach for Long-Term Lakeshore Monitoring , 2015, FSR.

[15] Wolfram Burgard,et al. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[16] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[17] Manuele Bicego,et al. Unsupervised activity recognition for autonomous water drones , 2018, SAC.

[18] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[19] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[20] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[21] Susanne Biundo-Stephan,et al. Exploiting Expert Knowledge in Factored POMDPs , 2012, ECAI.

[22] A. Galip Ulsoy,et al. Mission energy prediction for unmanned ground vehicles , 2012, 2012 IEEE International Conference on Robotics and Automation.

[23] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[24] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[25] Nora Ayanian,et al. Forecasting battery state of charge for robot missions , 2017, SAC.

[26] Rina Dechter,et al. Constraint Processing , 1995, Lecture Notes in Computer Science.

[27] Pascal Poupart,et al. Factored partially observable Markov decision processes for dialogue management , 2005 .

[28] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[29] Cecilia Laschi,et al. The HydroNet ASV, a Small-Sized Autonomous Catamaran for Real-Time Monitoring of Water Quality: From Design to Missions at Sea , 2015, IEEE Journal of Oceanic Engineering.