Searching for Dependencies in Bayesian Classifiers

Naive Bayesian classifiers which make independence assumptions perform remarkably well on some data sets but poorly on others. We explore ways to improve the Bayesian classifier by searching for dependencies among attributes. We propose and evaluate two algorithms for detecting dependencies among attributes and show that the backward sequential elimination and joining algorithm provides the most improvement over the naive Bayesian classifier. The domains on which the most improvement occurs are those domains on which the naive Bayesian classifier is significantly less accurate than a decision tree learner. This suggests that the attributes used in some common databases are not independent conditioned on the class and that the violations of the independence assumption that affect the accuracy of the classifier can be detected from training data.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  King-Sun Fu,et al.  Handbook of pattern recognition and image processing , 1986 .

[4]  Jeffrey C. Schlimmer Incremental Adjustment of Representations for Learning , 1987 .

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Brian R. Gaines,et al.  Current Trends in Knowledge Acquisition , 1990 .

[7]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[8]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[9]  Larry A. Rendell,et al.  Lookahead Feature Construction for Learning Hard Concepts , 1993, International Conference on Machine Learning.

[10]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[11]  Foster J. Provost,et al.  Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network , 1993, ICML.

[12]  William W. Cohen,et al.  Machine Learning, Proceedings of the Eleventh International Conference, Rutgers University, New Brunswick, NJ, USA, July 10-13, 1994 , 1994, ICML.

[13]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[14]  David W. Aha,et al.  Towards a Better Understanding of Memory-based Reasoning Systems , 1994, ICML.

[15]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[16]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[17]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[18]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[19]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.