Classification using Hierarchical Naïve Bayes models

Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well-performing set of classifiers is the Naïve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to “information double-counting” and interaction omission.In this paper we focus on a relatively new set of models, termed Hierarchical Naïve Bayes models. Hierarchical Naïve Bayes models extend the modeling flexibility of Naïve Bayes models by introducing latent variables to relax some of the independence statements in these models. We propose a simple algorithm for learning Hierarchical Naïve Bayes models in the context of classification. Experimental results show that the learned models can significantly improve classification accuracy as compared to other frameworks.

[1]  M. Pazzani Constructive Induction of Cartesian Product Attributes , 1998 .

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[4]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[5]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[6]  Dale Schuurmans,et al.  Learning Bayesian Nets that Perform Well , 1997, UAI.

[7]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  Prakash P. Shenoy,et al.  Axioms for probability and belief-function proagation , 1990, UAI.

[10]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Manfred Jaeger,et al.  Probabilistic Classifiers and the Concepts They Recognize , 2003, ICML.

[13]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[14]  Tomas Kocka,et al.  Dimension Correction for Hierarchical Latent Class Models , 2002, UAI.

[15]  Prakash P. Shenoy,et al.  Probability propagation , 1990, Annals of Mathematics and Artificial Intelligence.

[16]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[17]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[18]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[19]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[20]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[21]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[24]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[25]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[26]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[27]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[28]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[29]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[30]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[31]  Thomas D. Nielsen,et al.  Latent variable discovery in classification models , 2004, Artif. Intell. Medicine.

[32]  David Heckerman,et al.  A New Look at Causal Independence , 1994, UAI.

[33]  Anders L. Madsen,et al.  Lazy Propagation in Junction Trees , 1998, UAI.

[34]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[35]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[36]  Nils J. Nilsson,et al.  MLC++, A Machine Learning Library in C++. , 1995 .

[37]  Henry Tirri,et al.  On Supervised Learning of Bayesian Network Parameters , 2002 .

[38]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[39]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[40]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[43]  Jerome H. Friedman,et al.  On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.

[44]  E. Ziegel,et al.  Artificial intelligence and statistics , 1986 .

[45]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[46]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .