Learning the naive Bayes classifier with optimization models

Abstract Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.

[1]  Musa Mammadov,et al.  Globally convergent algorithms for solving unconstrained optimization problems , 2015 .

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[4]  Judy Kay,et al.  IEMS - The Intelligent Email Sorter , 2002, ICML.

[5]  János Csirik,et al.  On naive Bayes in speech recognition , 2005 .

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[10]  Adil M. Bagirov,et al.  Improving Naive Bayes Classifier Using Conditional Probabilities , 2011, AusDM.

[11]  Michael J. Pazzani,et al.  Collaborative Filtering with the Simple Bayesian Classifier , 2000, PRICAI.

[12]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[13]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[14]  Belkacem Ould Bouamama,et al.  Bayesian reliability models of Weibull systems: State of the art , 2012, Int. J. Appl. Math. Comput. Sci..

[15]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[16]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[17]  Jose Miguel Puerta,et al.  Ant colony optimization for learning Bayesian networks , 2002, Int. J. Approx. Reason..

[18]  Damian Borys,et al.  NODE ASSIGNMENT PROBLEM IN BAYESIAN NETWORKS , 2006 .

[19]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[20]  Michael W. Kattan,et al.  Orange and Decisions-at-Hand: Bridging Predictive Data Mining and Decision Support , 2001 .