Survey of Improving Naive Bayes for Classification

The attribute conditional independence assumption of naive Bayes essentially ignores attribute dependencies and is often violated. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network classifier from data is intractable. Thus, learning improved naive Bayes has attracted much attention from researchers and presented many effective and efficient improved algorithms. In this paper, we review some of these improved algorithms and single out four main improved approaches: 1) Feature selection; 2) Structure extension; 3) Local learning; 4) Data expansion. We experimentally tested these approaches using the whole 36 UCI data sets selected by Weka, and compared them to naive Bayes. The experimental results show that all these approaches are effective. In the end, we discuss some main directions for future research on Bayesian network classifiers.

[1]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[2]  Liangxiao Jiang,et al.  Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted , 2006, FSKD.

[3]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[4]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[5]  Liangxiao Jiang,et al.  Learning lazy naive Bayesian classifiers for ranking , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[6]  Mong-Li Lee,et al.  SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning , 2002, PAKDD.

[7]  Charles X. Ling,et al.  An Improved Learning Algorithm for Augmented Naive Bayes , 2001, PAKDD.

[8]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[9]  Liangxiao Jiang,et al.  Augmenting naive Bayes for ranking , 2005, ICML.

[10]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[11]  Liangxiao Jiang,et al.  Weightily averaged one-dependence estimators , 2006 .

[12]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[13]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[14]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[15]  Liangxiao Jiang,et al.  Instance Cloning Local Naive Bayes , 2005, Canadian Conference on AI.

[16]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[19]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[20]  Liangxiao Jiang,et al.  Learning instance greedily cloning naive Bayes for ranking , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[23]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[24]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[25]  Liangxiao Jiang,et al.  One Dependence Augmented Naive Bayes , 2005, ADMA.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[29]  Liangxiao Jiang,et al.  Hidden Naive Bayes , 2005, AAAI.

[30]  Liangxiao Jiang,et al.  Learning Naive Bayes for Probability Estimation by Feature Selection , 2006, Canadian Conference on AI.

[31]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[32]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[33]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[34]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[35]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[36]  Geoffrey I. Webb,et al.  PRICAI 2006: Trends in Artificial Intelligence, 9th Pacific Rim International Conference on Artificial Intelligence, Guilin, China, August 7-11, 2006, Proceedings , 2006, PRICAI.

[37]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .