Exploring Conditions For The Optimality Of Naïve Bayes

Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naive Bayes in classification? In this paper, we propose a novel explanation for the good classification performance of Naive Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naive Bayes. Further, we investigate the optimality of Naive Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naive Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out. Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or CI-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.

[1]  Charles X. Ling,et al.  Learnability of Augmented Naive Bayes in Nonimal Domains , 2001, ICML.

[2]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[3]  Dan Roth,et al.  Understanding Probabilistic Classifiers , 2001, ECML.

[4]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[5]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[6]  Leonard E. Trigg,et al.  Naive Bayes for regression , 1998 .

[7]  Gregory F. Cooper,et al.  A Bayesian Network Classifier that Combines a Finite Mixture Model and a NaIve Bayes Model , 1999, UAI.

[8]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[9]  Leonard E. Trigg,et al.  Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[10]  Dan Roth,et al.  Learning in Natural Language , 1999, IJCAI.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  S CooperWilliam Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval , 1995 .

[13]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[14]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[15]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[16]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[17]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[18]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[19]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.