Bayesian Network Classifiers Under the Ensemble Perspective

Augmented naive Bayesian classifiers relax the original independence assumption by allowing additional dependencies in the model. This strategy leads to parametrized learners that can produce a wide spectrum of models of increasing complexity. Expressiveness and efficiency can be controlled to adjust a trade-off specific to the problem at hand. Recent studies have transposed this finding to the domain of bias and variance, demonstrating that inducing complex multivariate probability distributions produces low-bias/high-variance classifiers that are especially suitable for large data domains. Frameworks like AkDE avoid structural learning and reduce variance by averaging a full family of constrained models, at the expense of increasing its spatial and computational complexity. Model selection is then required and performed using Information Theory techniques. We present a new approach to reduce model space from the point of view of ensemble classifiers, where we study the individual contribution to error for each model and how model selection affects this via the aggregation process. We perform a thorough experimentation to analyse bias stability and variance reduction and compare the results within the context of other popular ensemble models such as Random Forest, leading to a discussion on the effectiveness of the previous approaches. The conclusions support new strategies to design more consistent ensemble Bayesian network classifiers which we explore at the end of the paper.

[1]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Michael G. Madden,et al.  On the classification performance of TAN and general Bayesian networks , 2008, Knowl. Based Syst..

[4]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[5]  Geoffrey I. Webb,et al.  Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification , 2011, Machine Learning.

[6]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[7]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[8]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[9]  Geoffrey I. Webb,et al.  Selective AnDE for large data learning: a low-bias memory constrained approach , 2017, Knowledge and Information Systems.

[10]  Geoffrey I. Webb,et al.  Scalable Learning of Bayesian Network Classifiers , 2016, J. Mach. Learn. Res..

[11]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[12]  Geoffrey I. Webb,et al.  Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes , 2017, Machine Learning.

[13]  Geoffrey I. Webb,et al.  Sample-Based Attribute Selective A$n$ DE for Large Data , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  Jose Miguel Puerta,et al.  Learning distributed discrete Bayesian Network Classifiers under MapReduce with Apache Spark , 2017, Knowl. Based Syst..

[16]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[17]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[18]  Limin Wang,et al.  K-Dependence Bayesian Classifier Ensemble , 2017, Entropy.