Alleviating naive Bayes attribute independence assumption by attribute weighting

Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art classifiers like Random Forest, Logistic Regression and A1DE.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  A. H. Murphy,et al.  Reliability of Subjective Probability Forecasts of Precipitation and Temperature , 1977 .

[3]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[4]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[5]  Dimitrios Gunopulos,et al.  Feature selection for the naive bayesian classifier using decision trees , 2003, Appl. Artif. Intell..

[6]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[7]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[8]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[9]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[10]  D G T Denison,et al.  Weighted naive Bayes modelling for data miningJ , 2001 .

[11]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[12]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[15]  Geoffrey I. Webb,et al.  Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification , 2011, Machine Learning.

[16]  Franz Pernkopf,et al.  Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers , 2010, J. Mach. Learn. Res..

[17]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Zhihua Cai,et al.  Attribute Weighting via Differential Evolution Algorithm for Attribute Weighted Naive Bayes (WNB) , 2011 .

[20]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[21]  Ramón López de Mántaras,et al.  TAN Classifiers Based on Decomposable Distributions , 2005, Machine Learning.

[22]  Geoffrey I. Webb,et al.  The Need for Low Bias Algorithms in Classification Learning from Large Data Sets , 2002, PKDD.

[23]  Geoffrey I. Webb,et al.  Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees , 1999, ICML.

[24]  Ivan Bratko,et al.  ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users , 1987, EWSL.

[25]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[26]  Stephen E. Fienberg,et al.  The Comparison and Evaluation of Forecasters. , 1983 .

[27]  Ramón López de Mántaras,et al.  Robust Bayesian Linear Classifier Ensembles , 2005, ECML.

[28]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[29]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[30]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[31]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[32]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[33]  Geoffrey I. Webb,et al.  Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning , 2012, Machine Learning.

[34]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[35]  Harry Zhang,et al.  Learning weighted naive Bayes with accurate ranking , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[36]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[37]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[38]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[39]  Teemu Roos,et al.  Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood , 2011, J. Mach. Learn. Res..

[40]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[41]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[42]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.