Learning Bayesian Networks for Regression from Incomplete Databases

In this paper we address the problem of inducing Bayesian network models for regression from incomplete databases. We use mixtures of truncated exponentials (MTEs) to represent the joint distribution in the induced networks. We consider two particular Bayesian network structures, the so-called naive Bayes and TAN, which have been successfully used as regression models when learning from complete data. We propose an iterative procedure for inducing the models, based on a variation of the data augmentation method in which the missing values of the explanatory variables are filled by simulating from their posterior distributions, while the missing values of the response variable are generated using the conditional expectation of the response given the explanatory variables. We also consider the refinement of the regression models by using variable selection and bias reduction. We illustrate through a set of experiments with various databases the performance of the proposed algorithms.

[1]  Prakash P. Shenoy,et al.  Inference in hybrid Bayesian networks with mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[2]  José A. Gámez,et al.  Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials , 2006, Probabilistic Graphical Models.

[3]  Elvira: An Environment for Creating and Using Probabilistic Graphical Models , 2002, Probabilistic Graphical Models.

[4]  Antonio Fernández Bayes regression models with missing data using mixtures of truncated exponentials , 2008 .

[5]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[6]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.

[9]  Serafín Moral,et al.  Approximating Conditional MTE Distributions by Means of Mixed Trees , 2003, ECSQARU.

[10]  Leonard E. Trigg,et al.  Naive Bayes for regression , 1998 .

[11]  Rafael Rumí,et al.  Approximate probability propagation with mixtures of truncated exponentials , 2007, Int. J. Approx. Reason..

[12]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[15]  Pedro Larrañaga,et al.  Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes , 2006, Int. J. Approx. Reason..

[16]  Antonio Salmerón,et al.  Extension of Bayesian Network Classifiers to Regression Problems , 2008, IBERAMIA.

[17]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[18]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[19]  Serafín Moral,et al.  Estimating mixtures of truncated exponentials in hybrid bayesian networks , 2006 .

[20]  Rafael Rumí,et al.  Aalborg Universitet Inference in hybrid Bayesian networks , 2016 .

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[23]  Thomas D. Nielsen,et al.  Parameter Estimation in Mixtures of Truncated Exponentials , 2008 .

[24]  Antonio Salmerón,et al.  Tree Augmented Naive Bayes for Regression Using Mixtures of Truncated Exponentials: Application to Higher Education Management , 2007, IDA.

[25]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[26]  Leonard E. Trigg,et al.  Technical Note: Naive Bayes for Regression , 2000, Machine Learning.

[27]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28]  Carmelo Rodríguez,et al.  Selective Naive Bayes for Regression Based on Mixtures of Truncated Exponentials , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..