Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

Author(s): Gitter, Anthony; Huang, Furong; Valluvan, Ragupathyraj; Fraenkel, Ernest; Anandkumar, Animashree | Abstract: Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind.

[1]  R. Fisher 019: On the Interpretation of x2 from Contingency Tables, and the Calculation of P. , 1922 .

[2]  U. Grenander On the theory of mortality measurement , 1956 .

[3]  H. Barnett A Theory of Mortality , 1968 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[6]  E. Fraenkel,et al.  WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches , 2007, Environmental health perspectives.

[7]  Z. Xu,et al.  The SFP1 gene product of Saccharomyces cerevisiae regulates G2/M transitions during the mitotic cell cycle and DNA-damage response. , 1998, Genetics.

[8]  Michael I. Jordan Graphical Models , 1998 .

[9]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[10]  Tandy J. Warnow,et al.  A few logs suffice to build (almost) all trees (I) , 1999, Random Struct. Algorithms.

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. Botstein,et al.  Genome-wide characterization of the Zap1p zinc-responsive regulon in yeast. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[14]  F. Estruch Stress-controlled transcription factors, stress-induced genes and stress tolerance in budding yeast. , 2000, FEMS microbiology reviews.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[17]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[18]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[19]  D. Botstein,et al.  Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. , 2001, Molecular biology of the cell.

[20]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[21]  David R. Karger,et al.  Learning Markov networks: maximum bounded tree-width graphs , 2001, SODA '01.

[22]  M. Marton,et al.  Transcriptional Profiling Shows that Gcn4p Is a Master Regulator of Gene Expression during Amino Acid Starvation in Yeast , 2001, Molecular and Cellular Biology.

[23]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[24]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[25]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[26]  K. Benjamin,et al.  Sum1 and Ndt80 Proteins Compete for Binding to Middle Sporulation Element Sequences That Control Meiotic Gene Expression , 2003, Molecular and Cellular Biology.

[27]  Nicola J. Rinaldi,et al.  Computational discovery of gene modules and regulatory networks , 2003, Nature Biotechnology.

[28]  D. Pe’er,et al.  Module Networks : Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data , 2003 .

[29]  Chiara Sabatti,et al.  Network component analysis: Reconstruction of regulatory signals in biological systems , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[30]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[31]  V. Iyer,et al.  Genome-Wide Analysis of the Biology of Stress Responses through Heat Shock Transcription Factor , 2004, Molecular and Cellular Biology.

[32]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[33]  K. Struhl,et al.  The transcription factor Ifh1 is a key regulator of yeast ribosomal protein genes , 2004, Nature.

[34]  Graziano Pesole,et al.  Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes , 2004, Nucleic Acids Res..

[35]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[36]  Adam A. Margolin,et al.  Reverse engineering cellular networks , 2006, Nature Protocols.

[37]  A. Traven,et al.  Yeast Gal4: a transcriptional paradigm revisited , 2006, EMBO reports.

[38]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[39]  Elchanan Mossel Distorted Metrics on Trees and Phylogenetic Forests , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[41]  Olga G. Troyanskaya,et al.  Nested effects models for high-dimensional phenotyping screens , 2007, ISMB/ECCB.

[42]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[43]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[44]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[45]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[46]  Y. Sakaki,et al.  Phosphoproteome and transcriptome analyses of ErbB ligand-stimulated MCF-7 cells. , 2008, Cancer genomics & proteomics.

[47]  R. D. de Bruin,et al.  Stb1 Collaborates with Other Regulators To Modulate the G1-Specific Transcriptional Circuit , 2008, Molecular and Cellular Biology.

[48]  N. Friedman,et al.  Structure and function of a transcriptional network activated by the MAPK Hog1 , 2008, Nature Genetics.

[49]  Korbinian Strimmer,et al.  fdrtool: a versatile R package for estimating local and tail area-based false discovery rates , 2008, Bioinform..

[50]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[51]  I. Simon,et al.  Backup in gene regulatory networks explains differences between binding and knockout results , 2009, Molecular systems biology.

[52]  A. Hartemink,et al.  An ensemble model of competitive multi-factor binding of the genome. , 2009, Genome research.

[53]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[54]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[55]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[56]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[57]  Mariano J. Alvarez,et al.  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers , 2010, Molecular systems biology.

[58]  David J. Arenillas,et al.  JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles , 2009, Nucleic Acids Res..

[59]  Liang Tang,et al.  PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database , 2010, Nucleic Acids Res..

[60]  Brian J. Bennett,et al.  Comparative Analysis of Proteome and Transcriptome Variation in Mouse , 2011, PLoS genetics.

[61]  Vincent Y. F. Tan,et al.  Learning Latent Tree Graphical Models , 2010, J. Mach. Learn. Res..

[62]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[63]  Richard D. Smith,et al.  Network Analysis of Epidermal Growth Factor Signaling Using Integrated Genomic, Proteomic and Phosphorylation Data , 2012, PloS one.

[64]  Antonio Torralba,et al.  Context models and out-of-context objects , 2012, Pattern Recognition Letters.

[65]  D. Heckerman,et al.  Learning Transcriptional Regulatory Relationships Using Sparse Graphical Models , 2012, PloS one.

[66]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[67]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[68]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[69]  A. Pasquinelli MicroRNAs and their targets: recognition, regulation and an emerging reciprocal relationship , 2012, Nature Reviews Genetics.

[70]  Yanjun Qi,et al.  Learning the Dependency Structure of Latent Factors , 2012, NIPS.

[71]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[72]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[73]  James A. Thomson,et al.  Integrated Module and Gene-Specific Regulatory Inference Implicates Upstream Signaling Networks , 2013, PLoS Comput. Biol..

[74]  Yi Li,et al.  Beyond Physical Connections: Tree Models in Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[75]  W. C. Yim,et al.  PLANEX: the plant co-expression database , 2013, BMC Plant Biology.

[76]  Z. Bar-Joseph,et al.  Linking the signaling cascades and dynamic regulatory networks controlling stress responses , 2013, Genome research.

[77]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[78]  Adel Javanmard,et al.  Learning Linear Bayesian Networks with Latent Variables , 2012, ICML.

[79]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[80]  Ezekiel J. Maier,et al.  Mapping functional transcription factor networks from gene expression data , 2013, Genome research.

[81]  Li Liu,et al.  A multi-layer inference approach to reconstruct condition-specific genes and their regulation , 2013, Bioinform..

[82]  David B. Berry,et al.  Pathway connectivity and signaling coordination in the yeast stress-activated signaling network , 2014, Molecular systems biology.

[83]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[84]  C. Leslie,et al.  Linking signaling pathways to transcriptional programs in breast cancer , 2014, Genome research.

[85]  Stuart C. Sealfon,et al.  CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations , 2015, Bioinform..

[86]  Xinghua Lu,et al.  Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model , 2016, BMC Bioinformatics.

[87]  Casey S. Greene,et al.  Unsupervised Feature Construction and Knowledge Extraction from Genome-Wide Assays of Breast Cancer with Denoising Autoencoders , 2014, Pacific Symposium on Biocomputing.

[88]  C. Myers,et al.  Transcription Factor Activity Mapping of a Tissue-Specific in vivo Gene Regulatory Network. , 2015, Cell systems.

[89]  Mario L. Arrieta-Ortiz,et al.  An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network , 2015, Molecular systems biology.

[90]  John D. Storey,et al.  Consistent Estimation of Low-Dimensional Latent Structure in High-Dimensional Data , 2015, 1510.03497.

[91]  Eugenio Cinquemani,et al.  Inference of Quantitative Models of Bacterial Promoters from Time-Series Reporter Gene Data , 2015, PLoS Comput. Biol..

[92]  Chuan Gao,et al.  Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering , 2016, PLoS Comput. Biol..

[93]  M. Dinger,et al.  Endogenous microRNA sponges: evidence and controversy , 2016, Nature Reviews Genetics.

[94]  Deborah Chasman,et al.  Network-based approaches for analysis of complex biological systems. , 2016, Current opinion in biotechnology.

[95]  Benjamin A. Logsdon,et al.  Extracting a low-dimensional description of multiple gene expression datasets reveals a potential driver for tumor-associated stroma in ovarian cancer , 2016, bioRxiv.

[96]  Daniel J. O'Connell,et al.  Simultaneous Pathway Activity Inference and Gene Expression Analysis Using RNA Sequencing. , 2016, Cell systems.

[97]  Quanquan Gu,et al.  Identifying gene regulatory network rewiring using latent differential graphical models , 2016, Nucleic acids research.