Comparative criteria for partially observable contingent planning

In contingent planning under partial observability with sensing actions, agents actively use sensing to discover meaningful facts about the world. The solution can be represented as a plan tree or graph, branching on various possible observations. Typically in contingent planning one seeks a satisfying plan leading to a goal state at each leaf. In many applications, however, one may prefer some satisfying plans to others, such as plans that lead to the goal with a lower average cost. However, methods such as average cost make an implicit assumption concerning the probabilities of outcomes, which may not apply when the stochastic dynamics of the environment are unknown. We focus on the problem of providing valid comparative criteria for contingent plan trees and graphs, allowing us to compare two plans and decide which one is preferable. We suggest a set of such comparison criteria—plan simplicity, dominance, and best and worst plan costs.We also argue that in some cases certain branches of the plan correspond to an unlikely combination of mishaps, and can be ignored, and provide methods for pruning such unlikely branches before comparing the plan graphs. We explain these criteria, and discuss their validity, correlations, and application to real world problems. We also suggest efficient algorithms for computing the comparative criteria where needed. We provide experimental results, showing that existing contingent planners provide diverse plans, that can be compared using these criteria.

[1]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[2]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[3]  Carmel Domshlak,et al.  Fault Tolerant Planning: Complexity and Compilation , 2013, ICAPS.

[4]  Hector Geffner,et al.  Branching and pruning: An optimal temporal POCL planner based on constraint programming , 2004, Artif. Intell..

[5]  Guy Shani,et al.  Replanning in Domains with Partial Information and Sensing Actions , 2012, J. Artif. Intell. Res..

[6]  Blai Bonet,et al.  Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.

[7]  Jian Yang,et al.  Comparison of Optimal Solutions to Real-Time Path Planning for a Mobile Vehicle , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[8]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.

[9]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[10]  Blai Bonet,et al.  Belief Tracking for Planning with Sensing: Width, Complexity and Approximations , 2014, J. Artif. Intell. Res..

[11]  Jörg Hoffmann,et al.  Simulated Penetration Testing: From "Dijkstra" to "Turing Test++" , 2015, ICAPS.

[12]  Seong-Bae Park,et al.  Program plagiarism detection using parse tree Kernels , 2006 .

[13]  Meir Kalech,et al.  Sequential Plan Recognition: (Extended Abstract) , 2016, AAMAS.

[14]  Christian J. Muise,et al.  Computing Contingent Plans via Fully Observable Non-Deterministic Planning , 2014, AAAI.

[15]  Christian J. Muise,et al.  Improved Non-Deterministic Planning by Exploiting State Relevance , 2012, ICAPS.

[16]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[17]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[18]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[19]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[20]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[21]  Ai Poh Loh,et al.  Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[22]  David Leon,et al.  Dex: a semantic-graph differencing tool for studying changes in large code bases , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[23]  Jason M. O'Kane,et al.  Comparing the Power of Robots , 2008, Int. J. Robotics Res..

[24]  Fulvio Mastrogiovanni,et al.  Robust Navigation in an Unknown Environment With Minimal Sensing and Representation , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Sven Wachsmuth,et al.  Deploying a modeling framework for reusable robot behavior to enable informed strategies for domestic service robots , 2014, Robotics Auton. Syst..

[26]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[27]  Guy Shani,et al.  Computing Contingent Plans Using Online Replanning , 2016, AAAI.

[28]  Daniel Bryce,et al.  Planning Graph Heuristics for Belief Space Search , 2006, J. Artif. Intell. Res..

[29]  Blai Bonet,et al.  Solving POMDPs: RTDP-Bel vs. Point-based Algorithms , 2009, IJCAI.

[30]  Alberto Finzi,et al.  Human-Robot Interaction Through Mixed-Initiative Planning for Rescue and Search Rovers , 2005, AI*IA.

[31]  Panagiotis Louridas,et al.  Static code analysis , 2006, IEEE Software.

[32]  R. Brafman,et al.  Contingent Planning via Heuristic Forward Search witn Implicit Belief States , 2005, ICAPS.

[33]  Hector Geffner,et al.  Planning under Partial Observability by Classical Replanning: Theory and Experiments , 2011, IJCAI.

[34]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  Hector Geffner,et al.  A Translation-Based Approach to Contingent Planning , 2009, IJCAI.

[37]  Dorin Shmaryahu,et al.  Constructing Plan Trees for Simulated Penetration Testing , 2016 .

[38]  Anthony Stentz,et al.  Probabilistic planning with clear preferences on missing information , 2009, Artif. Intell..

[39]  Guy Shani,et al.  Simulated Penetration Testing as Contingent Planning , 2018, ICAPS.

[40]  Shlomi Maliah,et al.  Partially Observable Online Contingent Planning Using Landmark Heuristics , 2014, ICAPS.

[41]  Guy Shani,et al.  Replanning in Domains with Partial Information and Sensing Actions , 2011, IJCAI.

[42]  Hector Geffner,et al.  From Conformant into Classical Planning: Efficient Translations that May Be Complete Too , 2007, ICAPS.

[43]  Guy Shani,et al.  Improving Existing Fault Recovery Policies , 2009, NIPS.

[44]  Ellen Garbarino,et al.  Cognitive Effort, Affect, and Choice , 1997 .

[45]  Guy Shani,et al.  On The Properties of Belief Tracking for Online Contingent Planning using Regression , 2014, ECAI.

[46]  Craig Boutilier,et al.  Assessing regret-based preference elicitation with the UTPREF recommendation system , 2010, EC '10.

[47]  Craig Boutilier,et al.  Piecewise linear value function approximation for factored MDPs , 2002, AAAI/IAAI.

[48]  Paolo Traverso,et al.  Automated Planning and Acting , 2016 .

[49]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[50]  Guy Shani,et al.  A Multi-Path Compilation Approach to Contingent Planning , 2012, AAAI.