Learning bayesian networks for solving real-world problems

LEARNING BAYESIAN NETWORKS FOR SOLVING REAL-WORLD PROBLEMS Moninder Singh Supervisor: Gregory M. Provan Bayesian networks, which provide a compact graphical way to express complex probabilistic relationships among several random variables, are rapidly becoming the tool of choice for dealing with uncertainty in knowledge based systems. However, approaches based on Bayesian networks have often been dismissed as un t for many real-world applications since probabilistic inference is intractable for most problems of realistic size, and algorithms for learning Bayesian networks impose the unrealistic requirement of datasets being complete. In this thesis, I present practical solutions to these two problems, and demonstrate their e ectiveness on several real-world problems. The solution proposed to the rst problem is to learn selective Bayesian networks, i.e., ones that use only a subset of the given attributes to model a domain. The aim is to learn networks that are smaller, and hence computationally simpler to evaluate, but retain the performance of networks induced using all attributes. I present two methods for inducing selective Bayesian networks from data and evaluate them on several di erent problems. Both methods are shown to induce selective networks that are not only signi cantly smaller and computationally simpler to evaluate, but also perform as well, or better, than networks using vi all attributes. To address the second problem, I propose a principled method, based on the EM algorithm, for learning both Bayesian network structure and probabilities from incomplete data, and evaluate its performance on several datasets with di erent amounts of missing data and di erent assumptions about the missing data mechanisms. The proposed algorithm is shown to induce Bayesian networks that are very close to the actual underlying model. Finally, I apply both methods to the task of diagnosing acute abdominal pain. Known to be a very di cult domain, this is a very high dimensional problem characterized by a large number of attributes and missing data. Several researchers have argued that the simplest Bayesian network, the naive Bayesian classi er, is optimal for this problem. My experiments on two datasets in this domain show that not only do selective Bayesian networks use only a small fraction of the attributes but they also signi cantly outperform other methods, including the naive Bayesian classi er. vii

[1]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[2]  Peter C. Cheeseman,et al.  Selecting models from data , 1994, Lecture notes in statistics.

[3]  Bruce Abramson,et al.  Deriving a Minimal I-map of a Belief Network Relative to a Target Ordering of its Nodes**Supported in part by the National Science Foundation under grant SES-9106440. , 1993 .

[4]  M J Norusis,et al.  Diagnosis. I. Symptom nonindependence in mathematical models for diagnosis. , 1975, Computers and biomedical research, an international journal.

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Robert P. Goldman,et al.  Plan Recognition in Stories and in Life , 2013, UAI.

[7]  P. Spirtes,et al.  Causality From Probability , 1989 .

[8]  Bruce D'Ambrosio,et al.  Local expression languages for probabilistic dependence , 1995, Int. J. Approx. Reason..

[9]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[10]  Ross D. Shachter,et al.  Decision-Theoretic Foundations for Causal Reasoning , 1995, J. Artif. Intell. Res..

[11]  B S Todd,et al.  The Relative Accuracy of a Variety of Medical Diagnostic Programs , 1994, Methods of Information in Medicine.

[12]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[13]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[14]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[15]  R. Greiner,et al.  Knowing What Doesn't Matter: Exploiting Omitted Superruous Data , 1994 .

[16]  David J. Spiegelhalter,et al.  Sequential Model Criticism in Probabilistic Expert Systems , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[18]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[19]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[20]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[21]  D. Dombal Diagnosis of Acute Abdominal Pain , 1954 .

[22]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[23]  Gregory M. Provan,et al.  Tradeoffs in Constructing and Evaluating Temporal Influence Diagrams , 1993, UAI.

[24]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[25]  F H Edwards,et al.  Use of a Bayesian algorithm in the computer-assisted diagnosis of appendicitis. , 1984, Surgery, gynecology & obstetrics.

[26]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[27]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[28]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[29]  Ross D. Shachter Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[30]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[31]  B Séroussi,et al.  Computer-aided Diagnosis of Acute Abdominal Pain when Taking into Account Interactions , 1986, Methods of Information in Medicine.

[32]  Gregory M. Provan,et al.  A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers , 1995, ICML.

[33]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[34]  Denise L. Draper,et al.  Relevance Measures for Localized Partial Evaluation of Belief Networks , 1994 .

[35]  Moninder Singh,et al.  Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management , 1996, ICML.

[36]  L. A. Marascuilo,et al.  Nonparametric and Distribution-Free Methods for the Social Sciences , 1977 .

[37]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[38]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[39]  D. Fryback Bayes' theorem and conditional nonindependence of data in medical diagnosis. , 1978, Computers and biomedical research, an international journal.

[40]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[41]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[42]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[43]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[44]  Gert Pfurtscheller,et al.  Discovering Patterns in EEG-Signals: Comparative Study of a Few Methods , 1993, ECML.

[45]  W. J. H. Verkooijen,et al.  Which method learns most from the data , 1995 .

[46]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[47]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[48]  Kazuo J. Ezawa,et al.  Knowledge Discovery in Telecommunication Services Data Using Bayesian Network Models , 1995, KDD.

[49]  Robert M. Fung,et al.  Applying Bayesian networks to information retrieval , 1995, CACM.

[50]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[51]  Paola Sebastiani,et al.  Learning Bayesian Networks from Incomplete Databases , 1997, UAI.

[52]  Steen Andreassen,et al.  MUNIN - A Causal Probabilistic Network for Interpretation of Electromyographic Findings , 1987, IJCAI.

[53]  Gregory F. Cooper,et al.  A randomized approximation algorithm for probabilistic inference on bayesian belief networks , 1990, Networks.

[54]  David Poole,et al.  Average-Case Analysis of a Search Algorithm for Estimating Prior and Posterior Probabilities in Bayesian Networks with Extreme Probabilities , 1993, IJCAI.

[55]  Ross D. Shachter,et al.  Simulation Approaches to General Probabilistic Inference on Belief Networks , 2013, UAI.

[56]  H. J. Suermondt,et al.  Probabilistic Prediction of the Outcome of Bone-Marrow Transplantation , 1989 .

[57]  Wai Lam,et al.  Using Causal Information and Local Measures to Learn Bayesian Networks , 1993, UAI.

[58]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[59]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[60]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[61]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[62]  Dale Schuurmans,et al.  Learning Bayesian Nets that Perform Well , 1997, UAI.

[63]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[64]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[65]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[66]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[67]  Gregory Provan,et al.  Tradeoffs in Knowledge-Based Construction of Probabilistic Models , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[68]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[69]  R. Martin Chavez,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Edward H. Herskovits,et al.  Computer-based probabilistic-network construction , 1992 .

[71]  Ross D. Shachter,et al.  Directed reduction algorithms and decomposable graphs , 1990, UAI.

[72]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[73]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[74]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[75]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[76]  Remco R. Bouckaert,et al.  Probalistic Network Construction Using the Minimum Description Length Principle , 1993, ECSQARU.

[77]  David Maxwell Chickering,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1995, Mach. Learn..

[78]  Lei Xu,et al.  Best first strategy for feature selection , 1988, [1988 Proceedings] 9th International Conference on Pattern Recognition.

[79]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[80]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[81]  J. Suzuki Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties , 1999 .

[82]  Konrad Lang,et al.  Evaluation of automatic knowledge acquisition techniques in the diagnosis of acute abdominal pain - Acute Abdominal Pain Study Group , 1996, Artif. Intell. Medicine.

[83]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[84]  Max Henrion,et al.  Search-Based Methods to Bound Diagnostic Probabilities in Very Large Belief Nets , 1991, UAI.

[85]  W. G. Cochran The comparison of percentages in matched samples. , 1950, Biometrika.

[86]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[87]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[88]  Dale Schuurmans,et al.  Learning Default Concepts , 1994 .

[89]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[90]  Judea Pearl,et al.  An Algorithm for Deciding if a Set of Observed Independencies Has a Causal Explanation , 1992, UAI.