Computational regulatory genomics: motifs, networks, and dynamics

Gene regulation, the process responsible for taking a static genome and producing the diversity and complexity of life, is largely mediated through the sequence specific binding of regulators. The short, degenerate nature of the recognized elements and the unknown rules through which they interact makes deciphering gene regulation a significant challenge. In this thesis, we utilize comparative genomics and other approaches to exploit large-scale experimental datasets and better understand the sequence elements and regulators responsible for regulatory programs. In particular, we develop new computational approaches to (1) predict the binding sites of regulators using the genomes of many, closely related species; (2) understand the sequence motifs associated with transcription factors; (3) discover and characterize microRNAs, an important class of regulators; (4) use static predictions for binding sites in conjunction with chromatin modifications to better understand the dynamics of regulation; and (5) systematically validate the predicted motif instances using a massively parallel reporter assay. We find that the predictions made by our algorithms are of high quality and are comparable to those made by leading experimental approaches. Moreover, we find that experimental and computational approaches are often complementary. Regions experimentally identified to be bound by a factor can be species and cell line specific, but they lack the resolution and unbiased nature of our predictions. Experimentally identified miRNAs have unmistakable signs of being processed, but cannot provide the same insights our machine learning framework does. Further emphasizing the importance of integration, combining chromatin mark annotations and gene expression from multiple cell types with our static motif instances allows for increasing our power and making additional biologically relevant insights. We successfully apply the algorithms in this thesis to 29 mammals and 12 flies and expect them to be applicable to other clades of eukaryotic species. Moreover, we find that our performance has not yet plateaued and believe these methods will continue to be relevant as sequencing becomes increasingly commonplace and thousands of genomes become available. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Henriette O'Geen,et al.  Genome-wide binding of the orphan nuclear receptor TR4 suggests its general role in fundamental biological processes , 2010, BMC Genomics.

[3]  N. L. La Thangue,et al.  p300/CBP proteins: HATs for transcriptional bridges and scaffolds. , 2001, Journal of cell science.

[4]  Lucas D. Ward,et al.  Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences , 2008, ISMB.

[5]  R. Donnell,et al.  Role of Chromodomain Helicase DNA binding protein 2 in DNA damage response signaling and tumorigenesis , 2008, Oncogene.

[6]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[7]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[8]  R. Wharton,et al.  The Pumilio RNA-binding domain is also a translational regulator. , 1998, Molecular cell.

[9]  E. Birney,et al.  Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation , 2007, Nature Methods.

[10]  G. Hart,et al.  The Ubiquitin Carboxyl Hydrolase BAP1 Forms a Ternary Complex with YY1 and HCF-1 and Is a Critical Regulator of Gene Expression , 2010, Molecular and Cellular Biology.

[11]  R. Ferrell,et al.  Mutational inactivation of the p53 gene in the human erythroid leukemic K562 cell line. , 1993, Leukemia research.

[12]  E. Furlong,et al.  A core transcriptional network for early mesoderm development in Drosophila melanogaster. , 2007, Genes & development.

[13]  D. Gifford,et al.  Tissue-specific transcriptional regulation has diverged significantly between human and mouse , 2007, Nature Genetics.

[14]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[15]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[16]  L. Hood,et al.  Regulatory gene networks and the properties of the developmental process , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Roded Sharan,et al.  CREME: a framework for identifying cis-regulatory modules in human-mouse conserved segments , 2003, ISMB.

[18]  Gil Alterovitz,et al.  Automation in Proteomics and Genomics: An Engineering Case-Based Approach , 2009 .

[19]  E. Zandi,et al.  AP-1 function and regulation. , 1997, Current opinion in cell biology.

[20]  G. Mosialos,et al.  Epstein-Barr virus transformation: involvement of latent membrane protein 1-mediated activation of NF-κB , 1999, Oncogene.

[21]  G. Stormo,et al.  Analysis of Homeodomain Specificities Allows the Family-wide Prediction of Preferred Recognition Sites , 2008, Cell.

[22]  E. Scott,et al.  Requirement of transcription factor PU.1 in the development of multiple hematopoietic lineages. , 1994, Science.

[23]  David J. Arenillas,et al.  oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes , 2005, Nucleic acids research.

[24]  J. Miklossy,et al.  Interferon Regulatory Factor 4 Is Involved in Epstein-Barr Virus-Mediated Transformation of Human B Lymphocytes , 2008, Journal of Virology.

[25]  Xiaohui Xie,et al.  MotifMap: a human genome-wide map of candidate regulatory motif sites , 2009, Bioinform..

[26]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[27]  I. Davydov,et al.  The role of NF-Y and IRF-2 in the regulation of human IL-4 gene expression. , 1994, Journal of immunology.

[28]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[29]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[30]  B. Wasylyk,et al.  Pax-5 (BSAP) recruits Ets proto-oncogene family proteins to form functional ternary complexes on a B-cell-specific promoter. , 1996, Genes & development.

[31]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[32]  Manolis Kellis,et al.  A single Hox locus in Drosophila produces functional microRNAs from opposite DNA strands. , 2008, Genes & development.

[33]  M. Schweizer,et al.  Interaction between the two ubiquitously expressed transcription factors NF-Y and Sp1. , 1999, Gene.

[34]  A. Bird,et al.  DNA methylation landscapes: provocative insights from epigenomics , 2008, Nature Reviews Genetics.

[35]  Manolis Kellis,et al.  Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. , 2007, Genes & development.

[36]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[37]  J. Fak,et al.  Transcriptional Control in the Segmentation Gene Network of Drosophila , 2004, PLoS biology.

[38]  A. Philippakis,et al.  Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities , 2006, Nature Biotechnology.

[39]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[40]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[41]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[42]  Nathan C. Sheffield,et al.  Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. , 2011, Genome research.

[43]  S. Orkin,et al.  GATA transcription factors: key regulators of hematopoiesis. , 1995, Experimental hematology.

[44]  J. Kononen,et al.  POU5F1 (OCT3/4) identifies cells with pluripotent potential in human germ cell tumors. , 2003, Cancer research.

[45]  Timothy L. Bailey,et al.  Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data , 2010, BMC Bioinformatics.

[46]  A. Hagemeijer,et al.  Sequence conservation of the rad21 Schizosaccharomyces pombe DNA double-strand break repair gene in human and mouse. , 1996, Genomics.

[47]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[48]  M. Luftig,et al.  MDM2-Dependent Inhibition of p53 Is Required for Epstein-Barr Virus B-Cell Growth Transformation and Infected-Cell Survival , 2009, Journal of Virology.

[49]  Z. Weng,et al.  Detection of functional DNA motifs via statistical over-representation. , 2004, Nucleic acids research.

[50]  B. Oostra,et al.  A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. , 2003, Human molecular genetics.

[51]  I. Amit,et al.  Comprehensive mapping of long range interactions reveals folding principles of the human genome , 2011 .

[52]  R. Tantravahi,et al.  Myb and Ets proteins cooperate in transcriptional activation of the mim-1 promoter. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[53]  R. Gregory,et al.  Many roads to maturity: microRNA biogenesis pathways and their regulation , 2009, Nature Cell Biology.

[54]  Ivan Ovcharenko,et al.  Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[55]  M. Blanchette,et al.  Discovery of regulatory elements by a computational method for phylogenetic footprinting. , 2002, Genome research.

[56]  G. Rubin,et al.  Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[57]  S. Cherukuri,et al.  Developmental function of HMGN proteins. , 2010, Biochimica et biophysica acta.

[58]  J. Reis-Filho,et al.  The impact of expression profiling on prognostic and predictive testing in breast cancer , 2006, Journal of Clinical Pathology.

[59]  R. Patient,et al.  Phosphorylation of GATA-1 increases its DNA-binding affinity and is correlated with induction of human K562 erythroleukaemia cells. , 1999, Nucleic acids research.

[60]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[61]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[62]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[63]  Martin Vingron,et al.  PASTAA: identifying transcription factors associated with sets of co-regulated genes , 2008, Bioinform..

[64]  C. Deng,et al.  Roles of BRCA1 in DNA damage repair: a link between development and cancer. , 2003, Human molecular genetics.

[65]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[66]  Jun S. Liu,et al.  An algorithm for finding protein–DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments , 2002, Nature Biotechnology.

[67]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2010, Nucleic Acids Res..

[68]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[69]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[70]  H. Aburatani,et al.  Cohesin mediates transcriptional insulation by CCCTC-binding factor , 2008, Nature.

[71]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[72]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[73]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[74]  Jane M J Lin,et al.  Identification and Characterization of Cell Type–Specific and Ubiquitous Chromatin Regulatory Structures in the Human Genome , 2007, PLoS genetics.

[75]  Ling V. Sun,et al.  Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster , 2006, Proceedings of the National Academy of Sciences.

[76]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[77]  Michael P. Eichenlaub,et al.  A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. , 2006, Developmental cell.

[78]  R. Costa,et al.  Transcription factors in liver development, differentiation, and regeneration , 2003, Hepatology.

[79]  B. Thiers Genomic Instability and Aging-like Phenotype in the Absence of Mammalian SIRT6 , 2007 .

[80]  Marc S Halfon,et al.  Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs , 2008, Genome Biology.

[81]  V. Corces,et al.  CTCF: Master Weaver of the Genome , 2009, Cell.

[82]  S. Orkin,et al.  An Extended Transcriptional Network for Pluripotency of Embryonic Stem Cells , 2008, Cell.

[83]  C. Blattner,et al.  Nuclear accumulation and activation of p53 in embryonic stem cells after DNA damage , 2009, BMC Cell Biology.

[84]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[85]  Axel Behrens,et al.  Interaction of phosphorylated c-Jun with TCF4 regulates intestinal cancer development , 2005, Nature.

[86]  A. Aguzzi,et al.  Pax-5 encodes the transcription factor BSAP and is expressed in B lymphocytes, the developing CNS, and adult testis. , 1992, Genes & development.

[87]  B. Charlesworth,et al.  Evolution on the X chromosome: unusual patterns and processes , 2006, Nature Reviews Genetics.

[88]  Liming Cai,et al.  BEST: Binding-site Estimation Suite of Tools , 2005, Bioinform..

[89]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[90]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[91]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[92]  Hans Lassmann,et al.  The Widespread Impact of Mammalian MicroRNAs on mRNA Repression and Evolution , 2005 .

[93]  Tim J. P. Hubbard,et al.  Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster , 2006, PLoS Comput. Biol..

[94]  Aviv Regev,et al.  Transcriptional Regulatory Circuits: Predicting Numbers from Alphabets , 2009, Science.

[95]  Rongxiang Liu,et al.  Computationally identifying novel NF-kappa B-regulated immune genes in the human genome. , 2003, Genome research.

[96]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[97]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[98]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[99]  D. Corcoran,et al.  Features of Mammalian microRNA Promoters Emerge from Polymerase II Chromatin Immunoprecipitation Data , 2009, PloS one.

[100]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[101]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[102]  Alexander Varshavsky,et al.  Mapping proteinDNA interactions in vivo with formaldehyde: Evidence that histone H4 is retained on a highly transcribed gene , 1988, Cell.

[103]  M. Gerstein,et al.  Close association of RNA polymerase II and many transcription factors with Pol III genes , 2010, Proceedings of the National Academy of Sciences.

[104]  Jocelyn Kaiser,et al.  A Plan to Capture Human Diversity in 1000 Genomes , 2008, Science.

[105]  Gert Lubec,et al.  Limitations of current proteomics technologies. , 2005, Journal of chromatography. A.

[106]  M. Noyes,et al.  A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system , 2008, Nucleic acids research.

[107]  Manolis Kellis,et al.  Discovery and Characterization of Chromatin States for Systematic Annotation of the Human Genome , 2011, RECOMB.

[108]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[109]  F. Stossi,et al.  Whole-Genome Cartography of Estrogen Receptor α Binding Sites , 2007, PLoS genetics.

[110]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[111]  Grace Jordison Molecular Biology of the Gene , 1965, The Yale Journal of Biology and Medicine.

[112]  J. Milton,et al.  Identification of multiple cyclin subunits of human P-TEFb. , 1998, Genes & development.

[113]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[114]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[115]  Manolis Kellis,et al.  Conservation of small RNA pathways in platypus Material Supplemental , 2008 .

[116]  D. Hursh,et al.  Odd paired transcriptional activation of decapentaplegic in the Drosophila eye/antennal disc is cell autonomous but indirect. , 2010, Developmental biology.

[117]  James B. Brown,et al.  Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions , 2009, Genome Biology.

[118]  Manolis Kellis,et al.  Reliable prediction of regulator targets using 12 Drosophila genomes. , 2007, Genome research.

[119]  Alan M. Moses,et al.  MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model , 2004, Genome Biology.

[120]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[121]  David J. Reiss,et al.  CTCF physically links cohesin to chromatin , 2008, Proceedings of the National Academy of Sciences.

[122]  Victor X. Jin,et al.  Genomic Targets of the KRAB and SCAN Domain-containing Zinc Finger Protein 263* , 2009, The Journal of Biological Chemistry.

[123]  N. Rajewsky,et al.  Discovering microRNAs from deep sequencing data using miRDeep , 2008, Nature Biotechnology.

[124]  Ernest Fraenkel,et al.  High-resolution computational models of genome binding events , 2006, Nature Biotechnology.

[125]  Members of the Meis 1 and Pbx Homeodomain Protein Families Cooperatively Bind a cAMP-responsive Sequence ( CRS 1 ) from Bovine CYP 17 * , 1998 .

[126]  Eugene Berezikov,et al.  Approaches to microRNA discovery , 2006, Nature Genetics.

[127]  H. Jockusch,et al.  The human gene ZFP161 on 18p11.21-pter encodes a putative c-myc repressor and is homologous to murine Zfp161 (Chr 17) and Zfp161-rs1 (X Chr) , 1997, Genomics.

[128]  G. Crabtree,et al.  Diversity and specialization of mammalian SWI/SNF complexes. , 1996, Genes & development.

[129]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[130]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[131]  S. Yamanaka,et al.  Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors , 2006, Cell.

[132]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[133]  Keji Zhao,et al.  domains barrier regions reveals demarcation of active and repressive Global analysis of the insulator binding protein CTCF in chromatin Material , 2008 .

[134]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[135]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[136]  Albert J. Vilella,et al.  A high-resolution map of human evolutionary constraint using 29 mammals , 2011, Nature.

[137]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[138]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[139]  Thomas Shenk,et al.  YY1 is an initiator sequence-binding protein that directs and activates transcription in vitro , 1991, Nature.

[140]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[141]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[142]  V. Sementchenko,et al.  Ets target genes: past, present and future , 2000, Oncogene.

[143]  B. Monsarrat,et al.  The THAP-Zinc Finger Protein THAP1 Associates with Coactivator HCF-1 and O-GlcNAc Transferase , 2010, The Journal of Biological Chemistry.

[144]  A. Regev,et al.  Chromatin profiling by directly sequencing small quantities of immunoprecipitated DNA , 2010, Nature Methods.

[145]  Manolis Kellis,et al.  Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. , 2007, Genome research.

[146]  H. Lodish Molecular Cell Biology , 1986 .

[147]  R. Brent,et al.  Mxi1, a protein that specifically interacts with Max to bind Myc-Max recognition sites. , 1993, Cell.

[148]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[149]  O Bernard,et al.  Expression of tal-1 and GATA-binding proteins during human hematopoiesis. , 1993, Blood.

[150]  P. Farnham Insights from genomic profiling of transcription factors , 2009, Nature Reviews Genetics.

[151]  Promoter analysis by saturation mutagenesis , 2001, Biological Procedures Online.

[152]  F. Yi,et al.  Stem cells and TCF proteins: a role for beta-catenin--independent functions. , 2007, Stem cell reviews.

[153]  J. Carroll,et al.  Pioneer transcription factors: establishing competence for gene expression. , 2011, Genes & development.

[154]  Lovelace J. Luquette,et al.  Comprehensive analysis of the chromatin landscape in Drosophila , 2010, Nature.

[155]  E. Segal,et al.  Predicting expression patterns from regulatory sequence in Drosophila segmentation , 2008, Nature.

[156]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[157]  B. Blencowe Alternative Splicing: New Insights from Global Analyses , 2006, Cell.

[158]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[159]  A. J. Schroeder,et al.  Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. , 2007, Genome research.

[160]  D. W. Knowles,et al.  Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm , 2008, PLoS biology.

[161]  Zhiping Weng,et al.  Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification. , 2004, Nucleic acids research.

[162]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[163]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[164]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[165]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[166]  L. Lim,et al.  An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans , 2001, Science.

[167]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[168]  Alexandre V. Morozov,et al.  Statistical mechanical modeling of genome-wide transcription factor occupancy data by MatrixREDUCE , 2006, ISMB.

[169]  Vincent Bertrand,et al.  A combinatorial code of maternal GATA, Ets and β-catenin-TCF transcription factors specifies and patterns the early ascidian ectoderm , 2007, Development.

[170]  J. Lieb,et al.  ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. , 2004, Genomics.

[171]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[172]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2006, Nucleic Acids Research.

[173]  S. Orkin,et al.  Zinc-finger transcription factor Gfi-1: versatile regulator of lymphocytes, neutrophils and hematopoietic stem cells , 2006, Current opinion in hematology.

[174]  Yuchun Guo,et al.  Discovering homotypic binding events at high spatial resolution , 2010, Bioinform..

[175]  Gautier Koscielny,et al.  Ensembl Genomes: Extending Ensembl across the taxonomic space , 2009, Nucleic Acids Res..

[176]  Jong-Wan Park,et al.  ROS mediate the hypoxic repression of the hepcidin gene by inhibiting C/EBPalpha and STAT-3. , 2007, Biochemical and biophysical research communications.

[177]  B. Berger,et al.  Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3′UTRs , 2010, Proceedings of the National Academy of Sciences.

[178]  Daniel E. Newburger,et al.  Diversity and Complexity in DNA Recognition by Transcription Factors , 2009, Science.

[179]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[180]  Colin N. Dewey,et al.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures , 2007, Nature.

[181]  John T. Lis,et al.  Transcription Regulation Through Promoter-Proximal Pausing of RNA Polymerase II , 2008, Science.

[182]  Mark Gerstein,et al.  Divergence of transcription factor binding sites across related yeast species. , 2007, Science.

[183]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[184]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[185]  Chiara Vecchi,et al.  Dynamic Recruitment of NF-Y and Histone Acetyltransferases on Cell-cycle Promoters* , 2003, Journal of Biological Chemistry.

[186]  T. Jacks,et al.  Oct-2, although not required for early B-cell development, is critical for later B-cell maturation and for postnatal survival. , 1993, Genes & development.

[187]  Mihai Pop,et al.  Genome Sequence Assembly: Algorithms and Issues , 2002, Computer.

[188]  Christopher M. Player,et al.  Large-Scale Sequencing Reveals 21U-RNAs and Additional MicroRNAs and Endogenous siRNAs in C. elegans , 2006, Cell.

[189]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[190]  Benedict Paten,et al.  The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates , 2005, Genome Biology.

[191]  T. Mikkelsen,et al.  Rapid dissection and model-based optimization of inducible enhancers in human cells using a massively parallel reporter assay , 2012, Nature Biotechnology.

[192]  D. Levy,et al.  Cooperation between STAT3 and c-jun suppresses Fas transcription. , 2001, Molecular cell.

[193]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[194]  P. Jelinic,et al.  The Testis-Specific Factor CTCFL Cooperates with the Protein Methyltransferase PRMT7 in H19 Imprinting Control Region Methylation , 2006, PLoS biology.

[195]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[196]  Richard C. McEachin,et al.  Computationally Identifying Novel NF-κB-Regulated Immune Genes in the Human Genome , 2003 .

[197]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[198]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[199]  Manolis Kellis,et al.  The Tasmanian Devil Transcriptome Reveals Schwann Cell Origins of a Clonally Transmissible Cancer , 2009, Science.

[200]  M. Levine,et al.  DEAF-1 regulates immunity gene expression in Drosophila , 2008, Proceedings of the National Academy of Sciences.

[201]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[202]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[203]  Klaus H. Kaestner,et al.  The initiation of liver development is dependent on Foxa transcription factors , 2005, Nature.

[204]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[205]  M. R. Adams,et al.  Comparative genomics of the eukaryotes. , 2000, Science.

[206]  P. Pitha,et al.  The IRF family, revisited. , 2007, Biochimie.

[207]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[208]  R. Russell,et al.  Animal MicroRNAs Confer Robustness to Gene Expression and Have a Significant Impact on 3′UTR Evolution , 2005, Cell.

[209]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[210]  W. Bender,et al.  MicroRNAs in the Drosophila bithorax complex. , 2008, Genes & development.

[211]  Anthony A. Philippakis,et al.  Expression-Guided In Silico Evaluation of Candidate Cis Regulatory Codes for Drosophila Muscle Founder Cells , 2006, PLoS Comput. Biol..

[212]  D. Odom,et al.  The opposing transcriptional functions of Sin3A and c-Myc are required to maintain tissue homeostasis , 2011, Nature Cell Biology.

[213]  Hui Liu,et al.  Tmod: toolbox of motif discovery , 2010, Bioinform..

[214]  Jay Shendure,et al.  High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis , 2009, Nature Biotechnology.

[215]  Manolis Kellis,et al.  Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. , 2007, Genome research.

[216]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[217]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[218]  C. T. Farley,et al.  Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2008 .

[219]  Deborah J. Andrew,et al.  CrebA regulates secretory activity in the Drosophila salivary gland and epidermis , 2005, Development.

[220]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[221]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[222]  Bonnie Berger,et al.  Methods in Comparative Genomics: Genome Correspondence, Gene Identification and Regulatory Motif Discovery , 2004, J. Comput. Biol..

[223]  Daniel E. Newburger,et al.  Variation in Homeodomain DNA Binding Revealed by High-Resolution Analysis of Sequence Preferences , 2008, Cell.

[224]  Gos Micklem,et al.  Supporting Online Material Materials and Methods Figs. S1 to S50 Tables S1 to S18 References Identification of Functional Elements and Regulatory Circuits by Drosophila Modencode , 2022 .

[225]  Michael Q. Zhang,et al.  Similarity of position frequency matrices for transcription factor binding sites , 2005, Bioinform..

[226]  M. Morley,et al.  Making and reading microarrays , 1999, Nature Genetics.

[227]  B. De Moor,et al.  Toucan: deciphering the cis-regulatory logic of coregulated genes. , 2003, Nucleic acids research.

[228]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[229]  B M Turner,et al.  Histone deacetylases: complex transducers of nuclear signals. , 1999, Seminars in cell & developmental biology.

[230]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[231]  L. Yu,et al.  Coordination of transcription factors, NF-Y and C/EBP beta, in the regulation of the mdr1b promoter. , 1995, Cell growth & differentiation : the molecular biology journal of the American Association for Cancer Research.

[232]  T. Mizutani,et al.  Identification of SWI·SNF Complex Subunit BAF60a as a Determinant of the Transactivation Potential of Fos/Jun Dimers* , 2001, The Journal of Biological Chemistry.

[233]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[234]  S. Orkin,et al.  Role of SCL/Tal-1, GATA, and ets transcription factor binding sites for the regulation of flk-1 expression during murine vascular development. , 2000, Blood.

[235]  Eugene Berezikov,et al.  Functionally distinct regulatory RNAs generated by bidirectional transcription and processing of microRNA loci. , 2008, Genes & development.

[236]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[237]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[238]  N. Blackstone,et al.  Molecular Biology of the Cell.Fourth Edition.ByBruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and, Peter Walter.New York: Garland Science.$102.00. xxxiv + 1463 p; ill.; glossary (G:1–G:36); index (I:1–I:49); tables (T:1). ISBN: 0–8153–3218–1. [CD‐ROM included.] 2002. , 2003 .

[239]  T. Quertermous,et al.  Cooperative interaction of GATA-2 and AP1 regulates transcription of the endothelin-1 gene , 1995, Molecular and cellular biology.

[240]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[241]  Gautier Koscielny,et al.  Analysis of variation at transcription factor binding sites in Drosophila and humans , 2012, Genome Biology.

[242]  J. Taylor,et al.  A role for the ETS domain transcription factor PEA3 in myogenic differentiation , 1997, Molecular and cellular biology.

[243]  Ernest Fraenkel,et al.  WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches , 2007, Nucleic Acids Res..

[244]  Ronny Lorenz,et al.  The Vienna RNA Websuite , 2008, Nucleic Acids Res..

[245]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[246]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[247]  G. Rubin,et al.  Computational identification of Drosophila microRNA genes , 2003, Genome Biology.

[248]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[249]  M. Gerstein,et al.  Variation in Transcription Factor Binding Among Humans , 2010, Science.

[250]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[251]  J. Dekker,et al.  Capturing Chromosome Conformation , 2002, Science.

[252]  C. Burge,et al.  Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. , 2008, RNA.

[253]  I. Simon,et al.  Reconstructing dynamic regulatory maps , 2007, Molecular systems biology.

[254]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[255]  Jean-Stéphane Varré,et al.  Efficient and accurate P-value computation for Position Weight Matrices , 2007, Algorithms for Molecular Biology.

[256]  L. Madisen,et al.  Identification of Bach2 as a B‐cell‐specific partner for small Maf proteins that negatively regulate the immunoglobulin heavy chain gene 3′ enhancer , 1998, The EMBO journal.

[257]  Manolis Kellis,et al.  Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes , 2008, PLoS Comput. Biol..

[258]  A. Cederbaum,et al.  Transcription Factor Nrf2 Protects HepG2 Cells against CYP2E1 plus Arachidonic Acid-dependent Toxicity* , 2006, Journal of Biological Chemistry.

[259]  P. Baeuerle,et al.  Function and activation of NF-kappa B in the immune system. , 1994, Annual review of immunology.

[260]  Raymond Dingledine,et al.  Transcriptional repression by REST: recruitment of Sin3A and histone deacetylase to neuronal genes , 1999, Nature Neuroscience.