Latent Tree Classifier

We propose a novel generative model for classification called latent tree classifier (LTC). An LTC represents each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction. Latent tree models can capture complex relationship among attributes. Therefore, LTC can approximate the true distribution behind data well and thus achieve good classification accuracy. We present an algorithm for learning LTC and empirically evaluate it on 37 UCI data sets. The results show that LTC compares favorably to the state-of-the-art. We also demonstrate that LTC can reveal underlying concepts and discover interesting subgroups within each class.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  Thomas D. Nielsen,et al.  Latent variable discovery in classification models , 2004, Artif. Intell. Medicine.

[4]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[7]  Yi Wang,et al.  Latent tree models for multivariate density estimation: algorithms and applications , 2009 .

[8]  H. Akaike A new look at the statistical model identification , 1974 .

[9]  Thomas D. Nielsen,et al.  Latent classification models for binary data , 2009, Pattern Recognit..

[10]  Thomas D. Nielsen,et al.  Classification using Hierarchical Naïve Bayes models , 2006, Machine Learning.

[11]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[14]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[15]  Thomas D. Nielsen,et al.  Latent Classification Models , 2005, Machine Learning.

[16]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[17]  Tao Chen,et al.  Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery , 2008 .

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[21]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[22]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[23]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[24]  Gregory F. Cooper,et al.  A Bayesian Network Classifier that Combines a Finite Mixture Model and a NaIve Bayes Model , 1999, UAI.