Bayesian Network Classifiers

Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.

[1]  Philip M. Lewis,et al.  Approximating Probability Distributions to Reduce Storage Requirements , 1959, Information and Control.

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  M. Degroot Optimal Statistical Decisions , 1970 .

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[6]  A P Dawid,et al.  Properties of diagnostic data distributions. , 1976, Biometrics.

[7]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[11]  David Heckerman,et al.  Probabilistic similarity networks , 1991, Networks.

[12]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[13]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[16]  Dan Geiger,et al.  An Entropy-based Learning Algorithm of Bayesian Conditional Trees , 1992, UAI.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[19]  David J. Spiegelhalter,et al.  Bayesian analysis in expert systems , 1993 .

[20]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[21]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[22]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[23]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[24]  Gregory M. Provan,et al.  A Comparison of Induction Algorithms for Selective and non-Selective Bayesian Classifiers , 1995, ICML.

[25]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[26]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[27]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[28]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[29]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[30]  Kazuo J. Ezawa,et al.  Fraud/Uncollectible Debt Detection Using a Bayesian Network Based Learning System: A Rare Binary Outcome with Mixed Data Structures , 1995, UAI.

[31]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[32]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[33]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[34]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[35]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[36]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[37]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[38]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[39]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Incomplete Data Given a Bayesian Network , 1996, UAI.

[40]  David Heckerman,et al.  Asymptotic Model Selection for Directed Networks with Hidden Variables , 1996, UAI.

[41]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[42]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[43]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[44]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[45]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[46]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[47]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[48]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[49]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[50]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.