Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials

In this paper we propose a naive Bayes model for unsupervised data clustering, where the class variable is hidden. The feature variables can be discrete or continuous, as the conditional distributions are represented as mixtures of truncated exponentials (MTEs). The number of classes is determined using the data augmentation algorithm. The proposed model is compared with the conditional Gaussian model for some real world and synthetic databases.

[1]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[2]  Prakash P. Shenoy,et al.  Approximating Probability Density Functions with Mixtures of Truncated Exponentials , 2004 .

[3]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Jose A. Gámez Mining the ESROM: A study of breeding value prediction in Manchego sheep by means of classification techniques plus attribute selection and construction , 2005 .

[6]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Prakash P. Shenoy,et al.  Approximating probability density functions in hybrid Bayesian networks with mixtures of truncated exponentials , 2006, Stat. Comput..

[10]  P. P. Shenoy,et al.  Bayesian Network Models of Portfolio Risk and Return , 2000 .

[11]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[12]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[13]  Rafael Rumí,et al.  Learning hybrid Bayesian networks using mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[14]  李幼升,et al.  Ph , 1989 .

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[17]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.