Data clustering using hidden variables in hybrid Bayesian networks

In this paper, we analyze the problem of data clustering in domains where discrete and continuous variables coexist. We propose the use of hybrid Bayesian networks with naïve Bayes structure and hidden class variable. The model integrates discrete and continuous features, by representing the conditional distributions as mixtures of truncated exponentials (MTEs). The number of classes is determined through an iterative procedure based on a variation of the data augmentation algorithm. The new model is compared with an EM-based clustering algorithm where each class model is a product of conditionally independent probability distributions and the number of clusters is decided by using a cross-validation scheme. Experiments carried out over real-world and synthetic data sets show that the proposal is competitive with state-of-the-art methods. Even though the methodology introduced in this manuscript is based on the use of MTEs, it can be easily instantiated to other similar models, like the Mixtures of Polynomials or the mixtures of truncated basis functions in general.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  Prakash P. Shenoy,et al.  Inference in hybrid Bayesian networks using mixtures of polynomials , 2011, Int. J. Approx. Reason..

[5]  Prakash P. Shenoy,et al.  Approximating Probability Density Functions with Mixtures of Truncated Exponentials , 2004 .

[6]  Pedro M. Domingos,et al.  Naive Bayes models for probability estimation , 2005, ICML.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  P. P. Shenoy,et al.  Bayesian Network Models of Portfolio Risk and Return , 2000 .

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[10]  Antonio Fernández,et al.  Parameter learning in MTE networks using incomplete data , 2010 .

[11]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.

[12]  Gytis Karciauskas Learning with Hidden Variables , 2005 .

[13]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[14]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[15]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[16]  Rafael Rumí,et al.  Mixtures of truncated basis functions , 2012, Int. J. Approx. Reason..

[17]  Antonio Salmerón,et al.  Learning Bayesian Networks for Regression from Incomplete Databases , 2010, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Rafael Rumí,et al.  Approximate probability propagation with mixtures of truncated exponentials , 2007, Int. J. Approx. Reason..

[19]  Antonio Salmerón,et al.  Learning mixtures of truncated basis functions from data , 2014, Int. J. Approx. Reason..

[20]  Thomas D. Nielsen,et al.  Parameter Estimation in Mixtures of Truncated Exponentials , 2008 .

[21]  Carmelo Rodríguez,et al.  Selective Naive Bayes for Regression Based on Mixtures of Truncated Exponentials , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  José A. Gámez,et al.  Mixture of truncated exponentials in supervised classification: Case study for the naive bayes and averaged one-dependence estimators classifiers , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[23]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Prakash P. Shenoy,et al.  Approximating probability density functions in hybrid Bayesian networks with mixtures of truncated exponentials , 2006, Stat. Comput..

[26]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[27]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[28]  Juan L. Mateo,et al.  Mining the ESROM: A study of breeding value classification in Manchego sheep by means of attribute selection and construction , 2008 .

[29]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[30]  José A. Gámez,et al.  Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials , 2006, Probabilistic Graphical Models.

[31]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[32]  Ian Witten,et al.  Data Mining , 2000 .

[33]  Marvin Minsky,et al.  Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[34]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[35]  Rafael Rumí,et al.  Learning hybrid Bayesian networks using mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[36]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[37]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[38]  Rafael Rumí,et al.  Aalborg Universitet Inference in hybrid Bayesian networks , 2016 .

[39]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.