论文信息 - Efficient Online Learning for Optimizing Value of Information: Theory and Application to Interactive Troubleshooting

Efficient Online Learning for Optimizing Value of Information: Theory and Application to Interactive Troubleshooting

We consider the optimal value of information (VoI) problem, where the goal is to sequentially select a set of tests with a minimal cost, so that one can efficiently make the best decision based on the observed outcomes. Existing algorithms are either heuristics with no guarantees, or scale poorly (with exponential run time in terms of the number of available tests). Moreover, these methods assume a known distribution over the test outcomes, which is often not the case in practice. We propose an efficient sampling-based online learning framework to address the above issues. First, assuming the distribution over hypotheses is known, we propose a dynamic hypothesis enumeration strategy, which allows efficient information gathering with strong theoretical guarantees. We show that with sufficient amount of samples, one can identify a near-optimal decision with high probability. Second, when the parameters of the hypotheses distribution are unknown, we propose an algorithm which learns the parameters progressively via posterior sampling in an online fashion. We further establish a rigorous bound on the expected regret. We demonstrate the effectiveness of our approach on a real-world interactive troubleshooting application and show that one can efficiently make high-quality decisions with low cost.

[1] Uri Lerner,et al. Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[2] Mukesh K. Mohania,et al. Decision trees for entity identification: approximation algorithms and hardness results , 2007, PODS '07.

[3] Andreas Krause,et al. Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[4] Ole J. Mengshoel,et al. Diagnosis for uncertain, dynamic and hybrid domains using Bayesian networks and arithmetic circuits , 2014, Int. J. Approx. Reason..

[5] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[6] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[7] Dean P. Foster,et al. Regret in the On-Line Decision Problem , 1999 .

[8] Andreas Krause,et al. Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[9] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[10] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .

[11] Andreas Krause,et al. Submodular Surrogates for Value of Information , 2015, AAAI.

[12] Thomas D. Nielsen,et al. Using ROBDDs for Inference in Bayesian Networks with Troubleshooting as an Example , 2000, UAI.

[13] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[14] Sébastien Bubeck,et al. Prior-free and prior-dependent regret bounds for Thompson Sampling , 2013, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[15] Sanjoy Dasgupta,et al. Analysis of a greedy active learning strategy , 2004, NIPS.

[16] Andreas Krause,et al. Near Optimal Bayesian Active Learning for Decision Making , 2014, AISTATS.

[17] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[18] Avrim Blum,et al. On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[19] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[20] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[21] Andreas Krause,et al. Near-Optimal Bayesian Active Learning with Noisy Observations , 2010, NIPS.

[22] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[23] David Heckerman,et al. Decision-Theoretic Troubleshooting: A Framework for Repair and Experiment , 1996, UAI.

[24] Teresa M. Przytycka,et al. On an Optimal Split Tree Problem , 1999, WADS.

[25] Sarah J. Converse,et al. Special Issue Article: Adaptive management for biodiversity conservation in an uncertain world Which uncertainty? Using expert elicitation and expected value of information to design an adaptive program , 2011 .

[26] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[27] K. Chaloner,et al. Bayesian Experimental Design: A Review , 1995 .

[28] David Hsu,et al. DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[29] David Cohn,et al. Active Learning , 2010, Encyclopedia of Machine Learning.

[30] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[31] Andreas Krause,et al. Near-optimal Bayesian Active Learning with Correlated and Noisy Tests , 2016, AISTATS.

[32] Ronald A. Howard,et al. Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..