Predictive Discretization during Model Selection

We present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive a joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade-off between goodness of fit and model complexity including the number of discretization levels. Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels (independent of the metric used in the continuous space). Our experiments with artificial data as well as with gene expression data show that discretization plays a crucial role regarding the resulting network structure.

[1]  Harald Steck ( Semi-) Predictive Discretization During Model Selection , 2003 .

[2]  Nir Friedman,et al.  On the application of the bootstrap for computing confidence measures on features of induced Bayesian networks , 1999, AISTATS.

[3]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[4]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[5]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[6]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[7]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[8]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[9]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[10]  Gregory F. Cooper,et al.  A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data , 1998, UAI.

[11]  Tommi S. Jaakkola,et al.  Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Network Models , 2001, Pacific Symposium on Biocomputing.

[12]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[13]  Tommi S. Jaakkola,et al.  Bias-Corrected Bootstrap and Model Uncertainty , 2003, NIPS.

[14]  Tommi S. Jaakkola,et al.  On the Dirichlet Prior and Bayesian Regularization , 2002, NIPS.

[15]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[16]  Gregory F. Cooper,et al.  A latent variable model for multivariate discretization , 1999, AISTATS.