Bayesian model averaging of Bayesian network classifiers over multiple node-orders: application to sparse datasets

Bayesian model averaging (BMA) can resolve the overfitting problem by explicitly incorporating the model uncertainty into the analysis procedure. Hence, it can be used to improve the generalization performance of Bayesian network classifiers. Until now, BMA of Bayesian network classifiers has only been performed in some restricted forms, e.g., the model is averaged given a single node-order, because of its heavy computational burden. However, it can be hard to obtain a good node-order when the available training dataset is sparse. To alleviate this problem, we propose BMA of Bayesian network classifiers over several distinct node-orders obtained using the Markov chain Monte Carlo sampling technique. The proposed method was examined using two synthetic problems and four real-life datasets. First, we show that the proposed method is especially effective when the given dataset is very sparse. The classification accuracy of averaging over multiple node-orders was higher in most cases than that achieved using a single node-order in our experiments. We also present experimental results for test datasets with unobserved variables, where the quality of the averaged node-order is more important. Through these experiments, we show that the difference in classification performance between the cases of multiple node-orders and single node-order is related to the level of noise, confirming the relative benefit of averaging over multiple node-orders for incomplete data. We conclude that BMA of Bayesian network classifiers over multiple node-orders has an apparent advantage when the given dataset is sparse and noisy, despite the method's heavy computational cost.

[1]  Song-Chun Zhu,et al.  Analysis and synthesis of textured motion: particles and waves , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[3]  Byoung-Tak Zhang,et al.  System identification using evolutionary Markov chain Monte Carlo , 2001, J. Syst. Archit..

[4]  Ramón López de Mántaras,et al.  Tractable Bayesian Learning of Tree Augmented Naive Bayes Models , 2003, ICML.

[5]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[6]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[7]  J. Downing,et al.  Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells , 2003, Nature Genetics.

[8]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[9]  Gregory F. Cooper,et al.  Model Averaging for Prediction with Discrete Bayesian Networks , 2004, J. Mach. Learn. Res..

[10]  W. A. Wright,et al.  Bayesian approach to neural-network modeling with input uncertainty , 1999, IEEE Trans. Neural Networks.

[11]  Gregory F. Cooper,et al.  Exact model averaging with naive Bayesian classifiers , 2002, ICML.

[12]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  David Heckerman,et al.  Causal independence for probability assessment and inference using Bayesian networks , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[15]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[16]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[17]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[18]  D. Madigan,et al.  Correction to: ``Bayesian model averaging: a tutorial'' [Statist. Sci. 14 (1999), no. 4, 382--417; MR 2001a:62033] , 2000 .

[19]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[20]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[23]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.