Unraveling transcriptional regulatory programs by integrative analysis of microarray and transcription factor binding data

Motivation: Unraveling the transcriptional regulatory program mediated by transcription factors (TFs) is a fundamental objective of computational biology, yet still remains a challenge. Method: Here, we present a new methodology that integrates microarray and TF binding data for unraveling transcriptional regulatory networks. The algorithm is based on a two-stage constrained matrix decomposition model. The model takes into account the non-linear structure in gene expression data, particularly in the TF-target gene interactions and the combinatorial nature of gene regulation by TFs. The gene expression profile is modeled as a linear weighted combination of the activity profiles of a set of TFs. The TF activity profiles are deduced from the expression levels of TF target genes, instead directly from TFs themselves. The TF-target gene relationships are derived from ChIP-chip and other TF binding data. The proposed algorithm can not only identify transcriptional modules, but also reveal regulatory programs of which TFs control which target genes in which specific ways (either activating or inhibiting). Results: In comparison with other methods, our algorithm identifies biologically more meaningful transcriptional modules relating to specific TFs. We applied the new algorithm on yeast cell cycle and stress response data. While known transcriptional regulations were confirmed, novel TF-gene interactions were predicted and provide new insights into the regulatory mechanisms of the cell. Contact: zhanmi@mail.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Chao Cheng,et al.  BMC Genomics BioMed Central Methodology article , 2008 .

[4]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[5]  Juha Karhunen,et al.  Advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixtures , 2004, Int. J. Neural Syst..

[6]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[7]  W. Wong,et al.  Functional annotation and network reconstruction through cross-platform integration of microarray data , 2005, Nature Biotechnology.

[8]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[9]  M. Girolami,et al.  Advances in Independent Component Analysis , 2000, Perspectives in Neural Computing.

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Markus J. Herrgård,et al.  Reconciling gene expression data with known genome-scale regulatory network structures. , 2003, Genome research.

[12]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[13]  Byoung-Tak Zhang,et al.  Identification of regulatory modules by co-clustering latent variable models: stem cell differentiation , 2006, Bioinform..

[14]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Yu Sun,et al.  The discovery of transcriptional modules by a two-stage matrix decomposition approach , 2007, Bioinform..

[16]  Ming Zhan,et al.  Deciphering modular and dynamic behaviors of transcriptional networks , 2007, Genomic Medicine.

[17]  Gavin Sherlock,et al.  The Stanford Microarray Database: a user's guide. , 2006, Methods in molecular biology.

[18]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[19]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[20]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[21]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[22]  G. Church,et al.  Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. , 2002, Journal of molecular biology.

[23]  J. Winderickx,et al.  Inferring transcriptional modules from ChIP-chip, motif and microarray data , 2006, Genome Biology.

[24]  Huai Li,et al.  Systematic intervention of transcription for identifying network response to disease and cellular phenotypes , 2006, Bioinform..

[25]  Christina Backes,et al.  GeneTrail—advanced gene set enrichment analysis , 2007, Nucleic Acids Res..

[26]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[27]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[28]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[29]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[30]  Hyunsoo Kim,et al.  Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[31]  D H Sweet,et al.  Role of UME6 in transcriptional regulation of a DNA repair gene in Saccharomyces cerevisiae , 1997, Molecular and cellular biology.

[32]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[33]  J. L. Bruce,et al.  Activation of heat shock transcription factor 1 to a DNA binding form during the G(1)phase of the cell cycle. , 1999, Cell stress & chaperones.

[34]  R A Laskey,et al.  S phase of the cell cycle. , 1989, Science.

[35]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[36]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[37]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[38]  Tianwei Yu,et al.  Inference of transcriptional regulatory network by two-stage constrained space factor analysis , 2005, Bioinform..

[39]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Antti Honkela,et al.  Bayesian Non-Linear Independent Component Analysis by Multi-Layer Perceptrons , 2000 .

[41]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[42]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Brendan J. Frey,et al.  Multi-way clustering of microarray data using probabilistic sparse matrix factorization , 2005, ISMB.

[44]  Henry Horng-Shing Lu,et al.  Statistical methods for identifying yeast cell cycle transcription factors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[46]  Christian J Stoeckert,et al.  Clustering of genes into regulons using integrated modeling-COGRIM , 2007, Genome Biology.

[47]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.