Preference-Guided Planning: An Active Elicitation Approach

Planning with preferences has been employed extensively to quickly generate high-quality plans. However, it may be difficult for the human expert to supply this information without knowledge of the reasoning employed by the planner and the distribution of planning problems. We consider the problem of actively eliciting preferences from a human expert during the planning process. Specifically, we study this problem in the context of the Hierarchical Task Network (HTN) planning framework as it allows easy interaction with the human. Our experimental results on several diverse planning domains show that the preferences gathered using the proposed approach improve the quality and speed of the planner, while reducing the burden on the human expert.

[1]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Learning to Learn.

[2]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[3]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[4]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[5]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .

[6]  Bart Selman,et al.  Control Knowledge in Planning: Benefits and Tradeoffs , 1999, AAAI/IAAI.

[7]  Jude W. Shavlik,et al.  Incorporating Advice into Agents that Learn from Reinforcements , 1994, AAAI.

[8]  Karen L. Myers Advisable Planning Systems , 1996 .

[9]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[10]  Sriraam Natarajan,et al.  Actively Interacting with Experts: A Probabilistic Logic Approach , 2016, ECML/PKDD.

[11]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[12]  Craig Boutilier,et al.  A Constraint-Based Approach to Preference Elicitation and Decision Making , 1997 .

[13]  Jorge A. Baier,et al.  HTN Planning with Preferences , 2009, IJCAI.

[14]  James F. Allen,et al.  Human-Machine Collaborative Planning , 2002 .

[15]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[16]  Jude W. Shavlik,et al.  Using Advice to Transfer Knowledge Acquired in One Reinforcement Learning Task to Another , 2005, ECML.

[17]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[18]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[19]  Jude W. Shavlik,et al.  Online Knowledge-Based Support Vector Machines , 2010, ECML/PKDD.

[20]  James F. Allen,et al.  TRAINS-95: Towards a Mixed-Initiative Planning Assistant , 1996, AIPS.

[21]  Ronen I. Brafman,et al.  Planning with Goal Preferences and Constraints , 2005, ICAPS.

[22]  Judea Pearl,et al.  Specification and Evaluation of Preferences for Planning under Uncertainty , 1994 .

[23]  Sheila A. McIlraith,et al.  On Planning with Preferences in HTN , 2009, ArXiv.

[24]  Tom Bylander,et al.  Complexity Results for Planning , 1991, IJCAI.

[25]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[26]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[27]  Sriraam Natarajan,et al.  Active Advice Seeking for Inverse Reinforcement Learning , 2015, AAAI.

[28]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[29]  Sriraam Natarajan,et al.  Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[30]  Sriraam Natarajan,et al.  Guiding Autonomous Agents to Better Behaviors through Human Advice , 2013, 2013 IEEE 13th International Conference on Data Mining.

[31]  Matthias Scheutz,et al.  Architectural Mechanisms for Handling Human Instructions in Open-World Mixed-Initiative Team Tasks , 2013 .

[32]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[33]  T. Dean,et al.  Planning under uncertainty: structural assumptions and computational leverage , 1996 .

[34]  Subbarao Kambhampati,et al.  Probabilistic Planning via Determinization in Hindsight , 2008, AAAI.

[35]  Alan Fern,et al.  Imitation Learning with Demonstrations and Shaping Rewards , 2014, AAAI.

[36]  Alan Fern,et al.  Iterative Learning of Weighted Rule Sets for Greedy Search , 2010, ICAPS.

[37]  Bart Selman,et al.  Pushing the Envelope: Planning, Propositional Logic and Stochastic Search , 1996, AAAI/IAAI, Vol. 2.

[38]  Fahiem Bacchus,et al.  Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[39]  Ari K. Jónsson,et al.  MAPGEN: Mixed-Initiative Planning and Scheduling for the Mars Exploration Rover Mission , 2004, IEEE Intell. Syst..