Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs

There is a commonly held opinion that the algorithms for learning unrestricted types of Bayesian networks, especially those based on the score+search paradigm, are not suitable for building competitive Bayesian network-based classifiers. Several specialized algorithms that carry out the search into different types of directed acyclic graph (DAG) topologies have since been developed, most of these being extensions (using augmenting arcs) or modifications of the Naive Bayes basic topology. In this paper, we present a new algorithm to induce classifiers based on Bayesian networks which obtains excellent results even when standard scoring functions are used. The method performs a simple local search in a space unlike unrestricted or augmented DAGs. Our search space consists of a type of partially directed acyclic graph (PDAG) which combines two concepts of DAG equivalence: classification equivalence and independence equivalence. The results of exhaustive experimentation indicate that the proposed method can compete with state-of-the-art algorithms for classification.

[1]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[2]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[3]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[6]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[10]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[11]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[12]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[13]  Eamonn J. Keogh,et al.  Learning the Structure of Augmented Bayesian Classifiers , 2002, Int. J. Artif. Intell. Tools.

[14]  Thomas D. Nielsen,et al.  Latent variable discovery in classification models , 2004, Artif. Intell. Medicine.

[15]  Luis M. de Campos,et al.  A new approach for learning belief networks using independence criteria , 2000, Int. J. Approx. Reason..

[16]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[17]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[18]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[19]  Luis M. de Campos,et al.  A hybrid methodology for learning belief networks: BENEDICT , 2001, Int. J. Approx. Reason..

[20]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[21]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[22]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[23]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[24]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[25]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[26]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[27]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[28]  Constantin F. Aliferis,et al.  Identifying Markov blankets with decision tree induction , 2003, Third IEEE International Conference on Data Mining.

[29]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[30]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[31]  Moninder Singh,et al.  Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management , 1996, ICML.

[32]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[33]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[34]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[35]  Chang-Hwan Lee,et al.  Discretization of Continuous-Valued Attributes for Classification Learning , 1997 .

[36]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[37]  Pedro Larrañaga,et al.  Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches , 1998, Artif. Intell. Medicine.

[38]  Peter J. F. Lucas,et al.  Restricted Bayesian Network Structure Learning , 2002, Probabilistic Graphical Models.

[39]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[40]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.