Modelling Transcriptional Regulation with a Mixture of Factor Analyzers and Variational Bayesian Expectation Maximization

Understanding the mechanisms of gene transcriptional regulation through analysis of high-throughput postgenomic data is one of the central problems of computational systems biology. Various approaches have been proposed, but most of them fail to address at least one of the following objectives: (1) allow for the fact that transcription factors are potentially subject to posttranscriptional regulation; (2) allow for the fact that transcription factors cooperate as a functional complex in regulating gene expression, and (3) provide a model and a learning algorithm with manageable computational complexity. The objective of the present study is to propose and test a method that addresses these three issues. The model we employ is a mixture of factor analyzers, in which the latent variables correspond to different transcription factors, grouped into complexes or modules. We pursue inference in a Bayesian framework, using the Variational Bayesian Expectation Maximization (VBEM) algorithm for approximate inference of the posterior distributions of the model parameters, and estimation of a lower bound on the marginal likelihood for model selection. We have evaluated the performance of the proposed method on three criteria: activity profile reconstruction, gene clustering, and network inference.

[1]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[4]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[5]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[8]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[9]  Lorenz Wernisch,et al.  Factor analysis for gene regulatory networks and transcription factor activity profiles , 2007, BMC Bioinformatics.

[10]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[11]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[12]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[13]  Yoav Freund,et al.  Predicting genetic regulatory response using classification , 2004, ISMB/ECCB.

[14]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[15]  Jiang Qian,et al.  Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors , 2007, BMC Bioinformatics.

[16]  Frederik Brink Nielsen Variational Approach to Factor Analysis and Related Models , 2004 .

[17]  Jin Ho Yoon,et al.  Recruitment of the Swi/Snf Complex by Ste12-Tec1 Promotes Flo8-Mss11-Mediated Activation of STA1 Expression , 2004, Molecular and Cellular Biology.

[18]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[19]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[20]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.

[21]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[22]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[23]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[24]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[25]  Ming Zhang,et al.  Comparing sequences without using alignments: application to HIV/SIV subtyping , 2007, BMC Bioinformatics.

[26]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[27]  Zhi Ding,et al.  Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data , 2008, Bioinform..

[28]  Tom M. Mitchell,et al.  A Combined Expression-Interaction Model for Inferring the Temporal Activity of Transcription Factors , 2009, J. Comput. Biol..

[29]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[30]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[31]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[32]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[34]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[35]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[36]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Chiara Sabatti,et al.  Bayesian sparse hidden components analysis for transcription regulation networks , 2005, Bioinform..

[38]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[39]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[40]  Michael A. Beer,et al.  Predicting Gene Expression from Sequence , 2004, Cell.

[41]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[42]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[43]  T. Hughes,et al.  Exploration of Essential Gene Functions via Titratable Promoter Alleles , 2004, Cell.

[44]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[45]  Yoav Freund,et al.  Motif Discovery Through Predictive Modeling of Gene Regulation , 2005, RECOMB.

[46]  Panagiotis Tsikouras,et al.  Human embryonal epithelial cells of the developing small intestinal crypts can express the Hodgkin-cell associated antigen Ki-1 (CD30) in spontaneous abortions during the first trimester of gestation , 2005, Theoretical Biology and Medical Modelling.

[47]  P. Bourgine,et al.  Topological and causal structure of the yeast transcriptional regulatory network , 2002, Nature Genetics.

[48]  Katy C. Kao,et al.  Transcriptome-based determination of multiple transcription regulator activities in Escherichia coli by using network component analysis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Weixiong Zhang,et al.  A bi-dimensional regression tree approach to the modeling of gene expression regulation , 2006, Bioinform..

[50]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[51]  Doheon Lee,et al.  Regression trees for regulatory element identification , 2004, Bioinform..

[52]  D. M. Titterington,et al.  Mixtures of Factor Analysers. Bayesian Estimation and Inference by Stochastic Simulation , 2004, Machine Learning.

[53]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[54]  Neil D. Lawrence,et al.  Probabilistic inference of transcription factor concentrations and gene-specific regulatory activities , 2006, Bioinform..

[55]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[56]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[57]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[58]  Geoffrey J. McLachlan,et al.  Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution , 2007, Comput. Stat. Data Anal..