Learning Module Networks

Methods for learning Bayesian networks can discover dependency structure between observed variables. Although these methods are useful in many applications, they run into computational and statistical problems in domains that involve a large number of variables. In this paper, we consider a solution that is applicable when many variables have similar behavior. We introduce a new class of models, module networks, that explicitly partition the variables into modules, so that the variables in each module share the same parents in the network and the same conditional probability distribution. We define the semantics of module networks, and describe an algorithm that learns the modules' composition and their dependency structure from data. Evaluation on real data in the domains of gene expression and the stock market shows that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks.

[1]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[2]  N. E. Savin,et al.  The Bonferroni and the Scheffé multiple comparison procedures , 1980 .

[3]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[4]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[5]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[6]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[7]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[8]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[9]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[10]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[11]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[12]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[13]  Daphne Koller,et al.  Decomposing Gene Expression into Cellular Processes , 2002, Pacific Symposium on Biocomputing.

[14]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[15]  Thomas D. Nielsen,et al.  Fusion of Domain Knowledge with Data for Structural Learning in Object Oriented Domains , 2003, J. Mach. Learn. Res..

[16]  C. Ball,et al.  Saccharomyces Genome Database. , 2002, Methods in enzymology.

[17]  E. Lander Array of hope , 1999, Nature Genetics.

[18]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  Lise Getoor,et al.  From Instances to Classes in Probabilistic Relational Models , 2000, ICML 2000.

[21]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[24]  D. Pe’er,et al.  Module Networks : Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data , 2003 .

[25]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[26]  M. Degroot Optimal Statistical Decisions , 1970 .

[27]  Daphne Koller,et al.  Probabilistic discovery of overlapping cellular processes and their regulation , 2004, J. Comput. Biol..

[28]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[29]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[30]  Douglas C. Schmidt,et al.  Learning probabilistic relational models , 2001 .

[31]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[32]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[33]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[34]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[35]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.