A regression tree-based Gibbs sampler to learn the regulation programs in a transcription regulatory module network

Many algorithms have been proposed to learn transcription regulatory networks from gene expression data. Bayesian networks have obtained promising results, in particular, the module network method. The genes in a module share a regulation program (regression tree), consisting of a set of parents and conditional probability distributions. Hence, the method significantly decreases the search space of models and consequently avoids overfitting. The regulation program of a module is normally learned by a deterministic search algorithm, which performs a series of greedy operations to maximize the Bayesian score. The major shortcoming of the deterministic search algorithm is that its result may only represent one of several possible regulation programs. In order to account for the model uncertainty, we propose a regression tree-based Gibbs sampling algorithm for learning regulation programs in module networks. The novelty of this work is that a set of tree operations is defined for generating new regression trees from a given tree and we show that the set of tree operations is sufficient to generate a well mixing Gibbs sampler even in large data sets. The effectiveness of our algorithm is demonstrated by the experiments in synthetic data and real biological data.

[1]  Ben Taskar,et al.  Rich probabilistic models for gene expression , 2001, ISMB.

[2]  Kathleen Marchal,et al.  Validating module network learning algorithms using simulated data , 2007, BMC Bioinformatics.

[3]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[4]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.

[5]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[6]  Tim Hesterberg,et al.  Monte Carlo Strategies in Scientific Computing , 2002, Technometrics.

[7]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[8]  Yves Van de Peer,et al.  Analysis of a Gibbs sampler method for model-based clustering of gene expression data , 2008, Bioinform..

[9]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[10]  Alexandre P. Francisco,et al.  YEASTRACT-DISCOVERER: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae , 2007, Nucleic Acids Res..

[11]  Kathleen Marchal,et al.  Module networks revisited: computational assessment and prioritization of model predictions , 2009, Bioinform..

[12]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[13]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[14]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[17]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[20]  Jing Li,et al.  Regulatory module network of basic/helix-loop-helix transcription factors in mouse brain , 2007, Genome Biology.