Not So Naive Bayes: Aggregating One-Dependence Estimators

Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, both LBR and Super-Parent TAN have demonstrated remarkable error performance. However, both techniques obtain this outcome at a considerable computational cost. We present a new approach to weakening the attribute independence assumption by averaging all of a constrained class of classifiers. In extensive experiments this technique delivers comparable prediction accuracy to LBR and Super-Parent TAN with substantially improved computational efficiency at test time relative to the former and at training time relative to the latter. The new algorithm is shown to have low variance and is suited to incremental learning.

[1]  Richard Nock,et al.  On Learning Decision Committees , 1995, ICML.

[2]  Geoffrey I. Webb,et al.  Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees , 1999, ICML.

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  Geoffrey I. Webb,et al.  Adjusted Probability Naive Bayesian Induction , 1998, Australian Joint Conference on Artificial Intelligence.

[5]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[6]  David J. Hand,et al.  On Pruning and Averaging Decision Trees , 1995, ICML.

[7]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[8]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[10]  Michael J. Pazzani,et al.  On learning multiple descriptions of a concept , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[11]  Geoffrey I. Webb,et al.  Comparison of lazy Bayesian rule, and tree-augmented Bayesian learning , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[17]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[18]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[19]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[20]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[21]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[22]  Geoffrey I. Webb Candidate Elimination Criteria for Lazy Bayesian Rules , 2001, Australian Joint Conference on Artificial Intelligence.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[25]  Mong-Li Lee,et al.  SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning , 2002, PAKDD.

[26]  Geoffrey I. Webb,et al.  The Need for Low Bias Algorithms in Classification Learning from Large Data Sets , 2002, PKDD.

[27]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[28]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[29]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .