Characterizing EVOI-Sufficient k-Response Query Sets in Decision Problems

In finite decision problems where an agent can query its human user to obtain information about its environment before acting, a query’s usefulness is in terms of its Expected Value of Information (EVOI). The usefulness of a query set is similarly measured in terms of the EVOI of the queries it contains. When the only constraint on what queries can be asked is that they have exactlyk possible responses (withk 2), we show that the set ofk-response decision queries (which ask the user to select his/her preferred decision given a choice ofk decisions) is EVOI-Sufficient, meaning that no single k-response query can have higher EVOI than the best singlek-response decision query for any decision problem. When multiple queries can be asked before acting, we provide a negative result that shows the set of depth-n query trees constructed fromk-response decision queries is not EVOI-Sufficient. However, we also provide a positive result that the set of depth-n query trees constructed from kresponse decision-set queries, which ask the user to select from among k sets of decisions as to which set contains the best decision, is EVOISufficient. We conclude with a discussion and analysis of algorithms that draws on a connection to other recent work on decision-theoretic knowledge elicitation.

[1]  Craig Boutilier,et al.  Elicitation of Factored Utilities , 2008, AI Mag..

[2]  Edmund H. Durfee,et al.  Comparing Action-Query Strategies in Semi-Autonomous Agents , 2011, AAAI.

[3]  Joelle Pineau,et al.  Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale , 2012 .

[4]  Daphne Koller,et al.  Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.

[5]  Craig Boutilier,et al.  Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets , 2010, NIPS.

[6]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[7]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[8]  Thomas G. Dietterich,et al.  Active Imitation Learning via Reduction to I.I.D. Active Learning , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[9]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[10]  Edmund H. Durfee,et al.  Selecting Operator Queries Using Expected Myopic Gain , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[11]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[12]  Finn Verner Jensen,et al.  Myopic Value of Information in Influence Diagrams , 1997, UAI.

[13]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[14]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[15]  Susan Athey,et al.  The Value of Information in Monotone Decision Problems , 1998 .

[16]  Joelle Pineau,et al.  Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.

[17]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[18]  Valentina Bayer-Zubek Learning diagnostic policies from examples by systematic search , 2004, UAI 2004.

[19]  Warren B. Powell,et al.  Optimal Learning , 2022, Encyclopedia of Machine Learning and Data Mining.

[20]  Craig Boutilier,et al.  Local Utility Elicitation in GAI Models , 2005, UAI.

[21]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[22]  Craig Boutilier,et al.  Eliciting Additive Reward Functions for Markov Decision Processes , 2011, IJCAI.

[23]  Robert D. Nowak,et al.  The Geometry of Generalized Binary Search , 2009, IEEE Transactions on Information Theory.