Budgeted Distribution Learning of Belief Net Parameters

Most learning algorithms assume that a training dataset is given initially. We address the common situation where data is not available initially, but can be obtained, at a cost. We focus on learning Bayesian belief networks (BNs) over discrete variables. As such BNs are models of probabilistic distributions, we consider the "generative" challenge of learning the parameters for a fixed structure, that best match the true distribution. We focus on the budgeted learning setting, where there is a known fixed cost ci for acquiring the value of the ith feature for any specified instance, and a known total budget to spend acquiring all information. After formally defining this problem from a Bayesian perspective, we first consider non-sequential algorithms that must decide, before seeing any results, which features of which instances to probe. We show this is NP-hard, even if all variables are independent, then prove that the greedy allocation algorithm IGA is optimal here when the costs are uniform, but can otherwise be sub-optimal. We then show that general (sequential) policies perform better than non-sequential, and explore the challenges of learning the parameters for general belief networks in this sequential setting, describing conditions for when the obvious round-robin algorithm will, versus will not, work optimally. We also explore the effectiveness of this and various other heuristic algorithms.

[1]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[7]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[8]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[9]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[10]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[11]  Tommi S. Jaakkola,et al.  Unsupervised Active Learning in Large Domains , 2002, UAI.

[12]  Russell Greiner,et al.  Budgeted learning of nailve-bayes classifiers , 2002, UAI 2002.

[13]  Russell Greiner,et al.  Budgeted Learning of Naive-Bayes Classifiers , 2003, UAI.

[14]  Russell Greiner,et al.  Active Model Selection , 2004, UAI.

[15]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[16]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[17]  Andreas Krause,et al.  Optimal Nonmyopic Value of Information in Graphical Models - Efficient Algorithms and Theoretical Limits , 2005, IJCAI.

[18]  V. Melas Functional Approach to Optimal Experimental Design , 2005 .

[19]  Viatcheslav B. Melas,et al.  Functional Approach to Optimal Experimental Design (Lecture Notes in Statistics) , 2005 .

[20]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[21]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[22]  Varun Grover,et al.  Active Learning in Multi-armed Bandits , 2008, ALT.

[23]  Peter M. Hooper Exact distribution theory for belief net responses , 2008 .

[24]  Ion Muslea,et al.  Active Learning with Multiple Views , 2006, Encyclopedia of Data Warehousing and Mining.