Fuzzy Naive Bayes classifier based on fuzzy clustering

Despite its unrealistic independence assumption, the Naive Bayes classifier is remarkably successful in practice. In the Naive Bayes classifier, all variables are assumed to be nominal variables, it means that each variable has a finite number of values. But in large databases, the variables (or fields) often take continuous values or have a large number of numerical values. So many researchers discussed the discretization (or crisp partitioning) of the domain of the continuous variables. We generalize the Naive Bayes classifier to the situation in which the fuzzy partition of the variable domains instead of discretization is taken. Therefore each variable in the Fuzzy Naive Bayes classifier can take a linguistic value or fuzzy set. From the observed data set one method of estimating the conditional probabilities in the Fuzzy Naive Bayes classifier is proposed in this paper. For each numeric input the method to predict its class label using the fuzzy Naive Bayes classifier is presented. In the training phase of the classifier, the training data (just including the feature variables without class labels) is first clustered in an unsupervised way by fuzzy c-means or a similar algorithm. Then the optimal cluster centers of training data are used to determine the fuzzy partition of the feature variables space. This generalization can decrease the complexity of learning optimal discretization which the classical Naive Bayes Classifier often faces, reduce the loss of information because of the discretization and increase the power of dealing with imprecise data and the large databases. Some well-known classification problems in the machine learning field have been tested in this paper, the results show that the Fuzzy Naive Bayes classifier is an alternative and effective tool to deal with the classification problem which has continuous variables.

[1]  Ronald R. Yager,et al.  Including probabilistic uncertainty in fuzzy logic controller modeling using Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[2]  L. Zadeh Probability measures of Fuzzy events , 1968 .

[3]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[4]  Gernot D. Kleiter,et al.  Propagating Imprecise Probabilities in Bayesian Networks , 1996, Artif. Intell..

[5]  Gernot D. Kleiter,et al.  Bayesian Diagnosis in Expert Systems , 1992, Artif. Intell..

[6]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[7]  Yongchuan Tang,et al.  The identification of fuzzy weighted classification system incorporated with Fuzzy Naive Bayes from data , 2002, IEEE International Conference on Systems, Man and Cybernetics.

[8]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[9]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[10]  Didier Dubois,et al.  Random sets and fuzzy interval analysis , 1991 .

[11]  Hung T. Nguyen,et al.  Fuzzy sets and probability , 1997, Fuzzy Sets Syst..

[12]  Manoranjan Dash,et al.  Entropy-based fuzzy clustering and fuzzy modeling , 2000, Fuzzy Sets Syst..

[13]  Magne Setnes,et al.  Fuzzy relational classifier trained by fuzzy clustering , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[14]  John A. Drakopoulos,et al.  Probabilities, possibilities, and fuzzy sets , 1995, Fuzzy Sets Syst..

[15]  Jacques Labiche,et al.  From continuous to discrete variables for Bayesian network classifiers , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[16]  Sylvia Frühwirth-Schnatter,et al.  On fuzzy Bayesian inference , 1993 .