Allegro: Analyzing expression and sequence in concert to discover regulatory programs

A major goal of system biology is the characterization of transcription factors and microRNAs (miRNAs) and the transcriptional programs they regulate. We present Allegro, a method for de-novo discovery of cis-regulatory transcriptional programs through joint analysis of genome-wide expression data and promoter or 3′ UTR sequences. The algorithm uses a novel log-likelihood-based, non-parametric model to describe the expression pattern shared by a group of co-regulated genes. We show that Allegro is more accurate and sensitive than existing techniques, and can simultaneously analyze multiple expression datasets with more than 100 conditions. We apply Allegro on datasets from several species and report on the transcriptional modules it uncovers. Our analysis reveals a novel motif over-represented in the promoters of genes highly expressed in murine oocytes, and several new motifs related to fly development. Finally, using stem-cell expression profiles, we identify three miRNA families with pivotal roles in human embryogenesis.

[1]  E. Wagner,et al.  Fos and jun proteins are specifically expressed during differentiation of human keratinocytes. , 2005, The Journal of investigative dermatology.

[2]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[3]  E. Salmon Gene Expression During the Life Cycle of Drosophila melanogaster , 2002 .

[4]  K. Klempnauer,et al.  Expression of B-Myb during mouse embryogenesis. , 1996, Oncogene.

[5]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Lee Bardwell,et al.  A walk-through of the yeast mating pheromone response pathway , 2004, Peptides.

[7]  R. Shamir,et al.  Regulatory networks define phenotypic classes of human stem cell lines , 2008, Nature.

[8]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[9]  Richard A Young,et al.  Deciphering gene expression regulatory networks. , 2002, Current opinion in genetics & development.

[10]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[11]  B. Black,et al.  Transcriptional control of muscle development by myocyte enhancer factor-2 (MEF2) proteins. , 1998, Annual review of cell and developmental biology.

[12]  G. Rubin,et al.  Computational analysis of core promoters in the Drosophila genome , 2002, Genome Biology.

[13]  Josep Clotet,et al.  Hog1 mediates cell-cycle arrest in G1 phase by the dual targeting of Sic1 , 2004, Nature Cell Biology.

[14]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[15]  R. Sharan,et al.  Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. , 2003, Genome research.

[16]  Gerald M Rubin,et al.  Evidence for large domains of similarly expressed genes in the Drosophila genome , 2002, Journal of biology.

[17]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[18]  Ian Holmes,et al.  Finding Regulatory Elements Using Joint Likelihoods for Sequence and Expression Profile Data , 2000, ISMB.

[19]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[20]  Wolfgang Schmid,et al.  Targeted mutation of the CREB gene: compensation within the CREB/ATF family of transcription factors. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[21]  I. Herskowitz,et al.  Unique and redundant roles for HOG MAPK pathway components as revealed by whole-genome expression analysis. , 2003, Molecular biology of the cell.

[22]  E. Reddy,et al.  The myb gene family in cell growth, differentiation and apoptosis , 1999, Oncogene.

[23]  D. Metzger,et al.  The TFIID subunit TAF4 regulates keratinocyte proliferation and has cell-autonomous and non-cell-autonomous tumour suppressor activity in mouse epidermis , 2007, Development.

[24]  F. Stossi,et al.  Whole-Genome Cartography of Estrogen Receptor α Binding Sites , 2007, PLoS genetics.

[25]  D. Gifford,et al.  Tissue-specific transcriptional regulation has diverged significantly between human and mouse , 2007, Nature Genetics.

[26]  G. Crabtree,et al.  A transcriptional hierarchy involved in mammalian cell-type specification , 1992, Nature.

[27]  R. Shamir,et al.  Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets. , 2008, Genome research.

[28]  Frederick R. Cross,et al.  Pheromone-Dependent G1 Cell Cycle Arrest Requires Far1 Phosphorylation, but May Not Involve Inhibition of Cdc28-Cln2 Kinase, In Vivo , 1998, Molecular and Cellular Biology.

[29]  Yoshio Miki,et al.  Human Regulatory Factor X 4 (RFX4) Is a Testis-specific Dimeric DNA-binding Protein That Cooperates with Other Human RFX Members* , 2002, The Journal of Biological Chemistry.

[30]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[31]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[32]  Bing Ren,et al.  Unraveling epigenetic regulation in embryonic stem cells. , 2008, Cell stem cell.

[33]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[34]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[35]  F. Gage,et al.  A functional study of miR-124 in the developing neural tube. , 2007, Genes & development.

[36]  J. Nevins,et al.  E2Fs link the control of G1/S and G2/M transcription , 2004, The EMBO journal.

[37]  G. Stelzer,et al.  The expanding family of CREB/CREM transcription factors that are involved with spermatogenesis , 2002, Molecular and Cellular Endocrinology.

[38]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[39]  W. Schmid,et al.  Targeting of the CREB gene leads to up‐regulation of a novel CREB mRNA isoform. , 1996, The EMBO journal.

[40]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Eytan Domany,et al.  The promoters of human cell cycle genes integrate signals from two tumor suppressive pathways during cellular transformation , 2005, Molecular systems biology.

[42]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[43]  K. Kosik,et al.  Specific MicroRNAs Modulate Embryonic Stem Cell–Derived Neurogenesis , 2006, Stem cells.

[44]  C. Sander,et al.  A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing , 2007, Cell.

[45]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[46]  Daphne Koller,et al.  Genome-wide discovery of transcriptional modules from DNA sequence and gene expression , 2003, ISMB.

[47]  Mark Gerstein,et al.  Divergence of transcription factor binding sites across related yeast species. , 2007, Science.

[48]  P. Sharp,et al.  Proliferating Cells Express mRNAs with Shortened 3' Untranslated Regions and Fewer MicroRNA Target Sites , 2008, Science.

[49]  S. Grimes Testis-specific transcriptional control. , 2004, Gene.

[50]  Ron Shamir,et al.  Comprehensive MicroRNA Profiling Reveals a Unique Human Embryonic Stem Cell Signature Dominated by a Single Seed Sequence , 2008, Stem cells.

[51]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[52]  A. Marchler-Bauer,et al.  The Saccharomyces cerevisiae zinc finger proteins Msn2p and Msn4p are required for transcriptional induction through the stress response element (STRE). , 1996, The EMBO journal.

[53]  Zuo-min Zhou,et al.  Cloning and expression of a novel CREB mRNA splice variant in human testis. , 2004, Reproduction.

[54]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[55]  N. Slonim,et al.  A universal framework for regulatory element discovery across all genomes and data types. , 2007, Molecular cell.

[56]  John R. ten Bosch,et al.  The TAGteam DNA motif controls the timing of Drosophila pre-blastoderm transcription , 2006, Development.

[57]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[58]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Ron Shamir,et al.  Deciphering Transcriptional Regulatory Elements That Encode Specific Cell-Cycle Phasing by Comparative Genomics Analysis , 2005, Cell cycle.

[60]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[61]  S. Hohmann Osmotic Stress Signaling and Osmoadaptation in Yeasts , 2002, Microbiology and Molecular Biology Reviews.

[62]  Gene W Yeo,et al.  RNA sequence analysis defines Dicer's role in mouse embryonic stem cells , 2007, Proceedings of the National Academy of Sciences.

[63]  Ira Herskowitz,et al.  Yeast go the whole HOG for the hyperosmotic response. , 2002, Trends in genetics : TIG.

[64]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[65]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[66]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[67]  P. Bork,et al.  Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis , 2007, Molecular systems biology.

[68]  Zohar Yakhini,et al.  Discovering Motifs in Ranked Lists of DNA Sequences , 2007, PLoS Comput. Biol..

[69]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[70]  O. Elemento,et al.  Unmasking Activation of the Zygotic Genome Using Chromosomal Deletions in the Drosophila Embryo , 2007, PLoS Biology.

[71]  J. Shendure,et al.  Discovering functional transcription-factor combinations in the human cell cycle. , 2005, Genome research.

[72]  A. Toscani,et al.  Murine A-myb: evidence for differential splicing and tissue-specific expression. , 1994, Oncogene.

[73]  M. Whitlock Combining probability from independent tests: the weighted Z‐method is superior to Fisher's approach , 2005, Journal of evolutionary biology.

[74]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[75]  N. Dyson,et al.  The E2F transcriptional network: old acquaintances with new faces , 2005, Oncogene.

[76]  R. Young,et al.  A common set of gene regulatory networks links metabolism and growth inhibition. , 2004, Molecular cell.