Semi-naive Bayesian Classification by Weighted Kernel Density Estimation

Naive Bayes is one of the popular methods for supervised classification. The attribute conditional independence assumption makes Naive Bayes efficient but adversely affects the quality of classification results in many real-world applications. In this paper, a new feature-selection based method is proposed for semi-naive Bayesian classification in order to relax the assumption. A weighted kernel density model is first proposed for Bayesian modeling, which implements a soft feature selection scheme. Then, we propose an efficient algorithm to learn an optimized set of weights for the features, by using the least squares cross-validation method for optimal bandwidth selection. Experimental studies on six real-world datasets show the effectiveness and suitability of the proposed method for efficient Bayesian classification.

[1]  Xuesong Yan,et al.  Survey of Improving Naive Bayes for Classification , 2007, ADMA.

[2]  Thomas Gärtner,et al.  WBCsvm: Weighted Bayesian Classification based on Support Vector Machines , 2001, ICML.

[3]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[4]  Geoffrey I. Webb,et al.  A comparative study of Semi-naive Bayes methods in classification learning , 2005 .

[5]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[6]  Dejing Dou,et al.  Calculating Feature Weights in Naive Bayes with Kullback-Leibler Measure , 2011, 2011 IEEE 11th International Conference on Data Mining.

[7]  Shengrui Wang,et al.  Automated feature weighting in naive bayes for high-dimensional data classification , 2012, CIKM.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Qi Li,et al.  Nonparametric Econometrics: Theory and Practice , 2006 .

[10]  Qi Li,et al.  Cross-validation and the estimation of probability distributions with categorical data , 2006 .

[11]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[14]  Shengrui Wang,et al.  Model-Based Method for Projective Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[15]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[16]  T. Minka Estimating a Dirichlet distribution , 2012 .

[17]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[18]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[19]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[20]  Dimitrios Gunopulos,et al.  Feature selection for the naive bayesian classifier using decision trees , 2003, Appl. Artif. Intell..