Optimal Value of Information in Graphical Models

Many real-world decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristic-guided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to poly-tree graphical models. We prove that the optimizing value of information is NPPP-hard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #P-complete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several real-world datasets, including a prototype sensor network deployment for energy conservation in buildings.

[1]  Rich Hickey,et al.  The Clojure programming language , 2008, DLS '08.

[2]  Ronald E. Parr,et al.  Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes , 2005 .

[3]  J. Gittins,et al.  A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .

[4]  Andreas Krause,et al.  Efficient Planning of Informative Paths for Multiple Robots , 2006, IJCAI.

[5]  L. van der Gaag,et al.  Selective evidence gathering for diagnostic belief networks , 1993 .

[6]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7]  Eric Horvitz,et al.  An Approximate Nonmyopic Computation for Value of Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[9]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[10]  Vijay S. Mookerjee,et al.  Sequential Decision Models for Expert System Optimization , 1997, IEEE Trans. Knowl. Data Eng..

[11]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[12]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Miodrag Potkonjak,et al.  Sleeping Coordination for Comprehensive Sensing Using Isotonic Regression and Domatic Partitions , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[15]  Bruce Edmonds,et al.  Meta-Genetic Programming: Co-evolving the Operators of Variation , 2001 .

[16]  Peter G. Harrison,et al.  Functional Programming , 1988 .

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Michael L. Littman,et al.  The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[19]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[20]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[21]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Yoav Shoham,et al.  Optimal Testing of Structured Knowledge , 2008, AAAI.

[23]  Maarten Keijzer,et al.  The Push3 execution stack and the evolution of control , 2005, GECCO '05.

[24]  Lee Spector,et al.  Towards Practical Autoconstructive Evolution: Self-Evolution of Problem-Solving Genetic Programming Systems , 2011 .

[25]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[26]  A. Darwiche,et al.  Complexity Results and Approximation Strategies for MAP Explanations , 2011, J. Artif. Intell. Res..

[27]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[28]  Samir Khuller,et al.  Energy Efficient Monitoring in Sensor Networks , 2008, LATIN.

[29]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[30]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[31]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[32]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[33]  Maurice Queyranne,et al.  An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[34]  Ashish Goel,et al.  Set k-cover algorithms for energy efficient monitoring in wireless sensor networks , 2003, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[35]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[36]  L. Spector Adaptive populations of endogenously diversifying Pushpop organisms are reliably diverse , 2002 .

[37]  Rick L. Riolo,et al.  Genetic Programming Theory and Practice VIII , 2010 .

[38]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[39]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[40]  Lawrence Carin,et al.  Nonmyopic Multiaspect Sensing With Partially Observable Markov Decision Processes , 2007, IEEE Transactions on Signal Processing.

[41]  Stefan Wrobel,et al.  Active Learning of Partially Hidden Markov Models , 2001 .

[42]  Lise Getoor,et al.  VOILA: Efficient Feature-value Acquisition for Classification , 2007, AAAI.

[43]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[44]  Nicholas Roy,et al.  Global A-Optimal Robot Exploration in SLAM , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[45]  R. Bellman A Markovian Decision Process , 1957 .

[46]  Finn Verner Jensen,et al.  Myopic Value of Information in Influence Diagrams , 1997, UAI.

[47]  Jordan B. Pollack,et al.  Co-Evolving Intertwined Spirals , 1996, Evolutionary Programming.

[48]  Feng Zhao,et al.  Information-Driven Dynamic Sensor Collaboration for Tracking Applications , 2002 .

[49]  Lee Spector,et al.  Autoconstructive Evolution: Push, PushGP, and Pushpop , 2001 .

[50]  Solomon Eyal Shimony,et al.  Efficient Deterministic Approximation Algorithms for Non-myopic Value of Information in Graphical Models , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[51]  Valentina Bayer-Zubek Learning diagnostic policies from examples by systematic search , 2004, UAI 2004.

[52]  Leslie Ann Goldberg,et al.  COUNTING UNLABELLED SUBTREES OF A TREE IS #P-COMPLETE , 2000 .

[53]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[54]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[55]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[56]  Marimuthu Palaniswami,et al.  Computational Intelligence: A Dynamic System Perspective , 1995 .

[57]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[58]  Miodrag Potkonjak,et al.  Power efficient organization of wireless sensor networks , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[59]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[60]  Andreas Krause,et al.  Intelligent light control using sensor networks , 2005, SenSys '05.

[61]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[62]  Nicholas Roy,et al.  Efficient Optimization of Information-Theoretic Exploration in SLAM , 2008, AAAI.

[63]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[64]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[65]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[66]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[67]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[68]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[69]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[70]  Andreas Krause,et al.  Optimal Nonmyopic Value of Information in Graphical Models - Efficient Algorithms and Theoretical Limits , 2005, IJCAI.