Planning with actively eliciting preferences

Abstract Planning with preferences has been employed extensively to quickly generate high-quality plans. However, it may be difficult for the human expert to supply this information without knowledge of the reasoning employed by the planner. We consider the problem of actively eliciting preferences from a human expert during the planning process. Specifically, we study this problem in the context of the Hierarchical Task Network (HTN) planning framework as it allows easy interaction with the human. We propose an approach where the planner identifies when and where expert guidance will be most useful and seeks expert’s preferences accordingly to make better decisions. Our experimental results on several diverse planning domains show that the preferences gathered using the proposed approach improve the quality and speed of the planner, while reducing the burden on the human expert.

[1]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[2]  James F. Allen,et al.  TRAINS-95: Towards a Mixed-Initiative Planning Assistant , 1996, AIPS.

[3]  Craig Boutilier,et al.  A Constraint-Based Approach to Preference Elicitation and Decision Making , 1997 .

[4]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[5]  David Sarne,et al.  Estimating information value in collaborative multi-agent planning systems , 2007, AAMAS '07.

[6]  James F. Allen,et al.  Human-Machine Collaborative Planning , 2002 .

[7]  Karen L. Myers Advisable Planning Systems , 1996 .

[8]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[9]  Ronen I. Brafman,et al.  Planning with Goal Preferences and Constraints , 2005, ICAPS.

[10]  Sriraam Natarajan,et al.  Active Advice Seeking for Inverse Reinforcement Learning , 2015, AAAI.

[11]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[12]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[13]  Jude W. Shavlik,et al.  Creating advice-taking reinforcement learners , 2004, Machine Learning.

[14]  Bart Selman,et al.  Control Knowledge in Planning: Benefits and Tradeoffs , 1999, AAAI/IAAI.

[15]  Sriraam Natarajan,et al.  Guiding Autonomous Agents to Better Behaviors through Human Advice , 2013, 2013 IEEE 13th International Conference on Data Mining.

[16]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[17]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[18]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[19]  Tom Bylander,et al.  Complexity Results for Planning , 1991, IJCAI.

[20]  Judea Pearl,et al.  Specification and Evaluation of Preferences for Planning under Uncertainty , 1994 .

[21]  Sheila A. McIlraith,et al.  On Planning with Preferences in HTN , 2009, ArXiv.

[22]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[23]  Sriraam Natarajan,et al.  Actively Interacting with Experts: A Probabilistic Logic Approach , 2016, ECML/PKDD.

[24]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[25]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[26]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[27]  Jude W. Shavlik,et al.  Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.

[28]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[29]  Ronen I. Brafman,et al.  CP-nets: A Tool for Representing and Reasoning withConditional Ceteris Paribus Preference Statements , 2011, J. Artif. Intell. Res..

[30]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[31]  Alan Fern,et al.  Imitation Learning with Demonstrations and Shaping Rewards , 2014, AAAI.

[32]  Alan Fern,et al.  Iterative Learning of Weighted Rule Sets for Greedy Search , 2010, ICAPS.

[33]  Bart Selman,et al.  Pushing the Envelope: Planning, Propositional Logic and Stochastic Search , 1996, AAAI/IAAI, Vol. 2.

[34]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[35]  Ari K. Jónsson,et al.  MAPGEN: Mixed-Initiative Planning and Scheduling for the Mars Exploration Rover Mission , 2004, IEEE Intell. Syst..

[36]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[37]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .