Hidden Naive Bayes

The conditional independence assumption of naive Bayes essentially ignores attribute dependencies and is often violated. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network from data is intractable. The main reason is that learning the optimal structure of a Bayesian network is extremely time consuming. Thus, a Bayesian model without structure learning is desirable. In this paper, we propose a novel model, called hidden naive Bayes (HNB). In an HNB, a hidden parent is created for each attribute which combines the influences from all other attributes. We present an approach to creating hidden parents using the average of weighted one-dependence estimators. HNB inherits the structural simplicity of naive Bayes and can be easily learned without structure learning. We propose an algorithm for learning HNB based on conditional mutual information. We experimentally test HNB in terms of classification accuracy, using the 36 UCI data sets recommended by Weka (Witten & Frank 2000), and compare it to naive Bayes (Langley, Iba, & Thomas 1992), C4.5 (Quinlan 1993), SBC (Langley & Sage 1994), NBTree (Kohavi 1996), CL-TAN (Friedman, Geiger, & Goldszmidt 1997), and AODE (Webb, Boughton, & Wang 2005). The experimental results show that HNB outperforms naive Bayes, C4.5, SBC, NBTree, and CL-TAN, and is competitive with AODE.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[3]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[4]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[7]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[8]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[11]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[13]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[14]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[15]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[16]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.