When is the Naive Bayes approximation not so naive?

The Naive Bayes approximation (NBA) and associated classifier are widely used and offer robust performance across a large spectrum of problem domains. As it depends on a very strong assumption—independence among features—this has been somewhat puzzling. Various hypotheses have been put forward to explain its success and many generalizations have been proposed. In this paper we propose a set of “local” error measures—associated with the likelihood functions for subsets of attributes and for each class—and show explicitly how these local errors combine to give a “global” error associated to the full attribute set. By so doing we formulate a framework within which the phenomenon of error cancelation, or augmentation, can be quantified and its impact on classifier performance estimated and predicted a priori. These diagnostics allow us to develop a deeper and more quantitative understanding of why the NBA is so robust and under what circumstances one expects it to break down. We show how these diagnostics can be used to select which features to combine and use them in a simple generalization of the NBA, applying the resulting classifier to a set of real world data sets.

[1]  Leonard E. Trigg,et al.  Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[2]  Gregory F. Cooper,et al.  A Bayesian Network Classifier that Combines a Finite Mixture Model and a NaIve Bayes Model , 1999, UAI.

[3]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[4]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[5]  Geoffrey I. Webb,et al.  Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees , 1999, ICML.

[6]  Riccardo Poli,et al.  Taming the Complexity of Natural and Artificial Evolutionary Dynamics , 2014, Evolution, Complexity and Artificial Life.

[7]  Ayse Basar Bener,et al.  Analysis of Naive Bayes' assumptions on software fault data: An empirical study , 2009, Data Knowl. Eng..

[8]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[9]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[10]  Li Zhang,et al.  Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks , 2014, Expert Syst. Appl..

[11]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[12]  Geoffrey I. Webb,et al.  Adjusted Probability Naive Bayesian Induction , 1998, Australian Joint Conference on Artificial Intelligence.

[13]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[14]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[15]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[16]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[17]  Zaw Zaw Htike,et al.  Bacteria identification from microscopic morphology using naïve bayes , 2014 .

[18]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[19]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[20]  Geoffrey I. Webb,et al.  Lazy Learning of Bayesian Rules , 2000, Machine Learning.

[21]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[22]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[23]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[24]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Patrick S. Broos,et al.  A NAIVE BAYES SOURCE CLASSIFIER FOR X-RAY SOURCES , 2011, 1102.5120.

[26]  Geoffrey I. Webb,et al.  Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification , 2011, Machine Learning.

[27]  Jose Miguel Puerta,et al.  Speeding up incremental wrapper feature subset selection with Naive Bayes classifier , 2014, Knowl. Based Syst..

[28]  Christopher R. Stephens,et al.  Predicting healthcare costs using GAs , 2005, GECCO '05.

[29]  Harry Zhang,et al.  Naive Bayes for optimal ranking , 2008, J. Exp. Theor. Artif. Intell..

[30]  Shyam Visweswaran,et al.  The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data , 2011, J. Am. Medical Informatics Assoc..

[31]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[32]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[33]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[34]  Manas Ranjan Patra,et al.  NETWORK INTRUSION DETECTION USING NAÏVE BAYES , 2007 .

[35]  Leonard E. Trigg,et al.  Naive Bayes for regression , 1998 .

[36]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[37]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[38]  Kwok L. Tsui,et al.  A naive Bayes model for robust remaining useful life prediction of lithium-ion battery , 2014 .

[39]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[40]  Geoffrey I. Webb Candidate Elimination Criteria for Lazy Bayesian Rules , 2001, Australian Joint Conference on Artificial Intelligence.

[41]  Mong-Li Lee,et al.  SNNB: A Selective Neighborhood Based Naïve Bayes for Lazy Learning , 2002, PAKDD.

[42]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[43]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[44]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..