Bayes regression models with missing data using mixtures of truncated exponentials

In the last years, mixtures of truncated exponentials (MTEs) have received much attention within the context of probabilistic graphical models, as they provide a framework for hybrid Bayesian networks which is compatible with standard inference algorithms and no restriction on the structure of the network is considered. Recently, MTEs have also been successfully applied to regression problems in which the underlying network structure is a näıve Bayes or a TAN. However, the algorithms described so far in the literature operate over complete databases. In this paper we propose an iterative algorithm for constructing näıve Bayes regression models from incomplete databases. It is based on a variation of the data augmentation method in which the missing values of the explanatory variables are filled by simulating from their posterior distributions, while the missing values of the response variable are generated from its conditional expectation given the explanatory variables. We illustrate through a set of experiments with various databases that the proposed algorithm behaves reasonably well.

[1]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.

[2]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[3]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[4]  Prakash P. Shenoy,et al.  Inference in hybrid Bayesian networks with mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[5]  Antonio Salmerón,et al.  Tree Augmented Naive Bayes for Regression Using Mixtures of Truncated Exponentials: Application to Higher Education Management , 2007, IDA.

[6]  Serafín Moral,et al.  Approximating Conditional MTE Distributions by Means of Mixed Trees , 2003, ECSQARU.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Serafín Moral,et al.  Estimating mixtures of truncated exponentials in hybrid bayesian networks , 2006 .

[9]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[10]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[11]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[12]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[13]  José A. Gámez,et al.  Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials , 2006, Probabilistic Graphical Models.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Rafael Rumí,et al.  Approximate probability propagation with mixtures of truncated exponentials , 2007, Int. J. Approx. Reason..

[16]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.