Genome-wide discovery of human heart enhancers.

The various organogenic programs deployed during embryonic development rely on the precise expression of a multitude of genes in time and space. Identifying the cis-regulatory elements responsible for this tightly orchestrated regulation of gene expression is an essential step in understanding the genetic pathways involved in development. We describe a strategy to systematically identify tissue-specific cis-regulatory elements that share combinations of sequence motifs. Using heart development as an experimental framework, we employed a combination of Gibbs sampling and linear regression to build a classifier that identifies heart enhancers based on the presence and/or absence of various sequence features, including known and putative transcription factor (TF) binding specificities. In distinguishing heart enhancers from a large pool of random noncoding sequences, the performance of our classifier is vastly superior to four commonly used methods, with an accuracy reaching 92% in cross-validation. Furthermore, most of the binding specificities learned by our method resemble the specificities of TFs widely recognized as key players in heart development and differentiation, such as SRF, MEF2, ETS1, SMAD, and GATA. Using our classifier as a predictor, a genome-wide scan identified over 40,000 novel human heart enhancers. Although the classifier used no gene expression information, these novel enhancers are strongly associated with genes expressed in the heart. Finally, in vivo tests of our predictions in mouse and zebrafish achieved a validation rate of 62%, significantly higher than what is expected by chance. These results support the existence of underlying cis-regulatory codes dictating tissue-specific transcription in mammalian genomes and validate our enhancer classifier strategy as a method to uncover these regulatory codes.

[1]  C. Kimmel,et al.  Cell movements during epiboly and gastrulation in zebrafish. , 1990, Development.

[2]  Michael Q. Zhang,et al.  A weight array method for splicing signal analysis , 1993, Comput. Appl. Biosci..

[3]  G. Lyons,et al.  Mef2 gene expression marks the cardiac and skeletal muscle lineages during mouse embryogenesis. , 1994, Development.

[4]  M. Cervera,et al.  Determination of the Consensus Binding Site for MEF2 Expressed in Muscle and Brain Reveals Tissue-specific Sequence Constraints (*) , 1995, The Journal of Biological Chemistry.

[5]  C. Kimmel,et al.  Stages of embryonic development of the zebrafish , 1995, Developmental dynamics : an official publication of the American Association of Anatomists.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Steven Salzberg,et al.  A method for identifying splice sites and translational start sites in eukaryotic mRNA , 1997, Comput. Appl. Biosci..

[8]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[9]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[10]  S. Yamamoto,et al.  Genomic cloning and characterization of the mouse POZ/zinc-finger protein ZF5. , 1998, Biochemical and biophysical research communications.

[11]  C. Drake,et al.  The transcription factor MEF2C-null mouse exhibits complex vascular malformations and reduced cardiac expression of angiopoietin 1 and VEGF. , 1999, Developmental biology.

[12]  K. Hidaka,et al.  Expression of MEF2 genes during human cardiac development. , 1999, The Tohoku journal of experimental medicine.

[13]  P. Carlsson,et al.  Forkhead transcription factor FoxF2 is expressed in mesodermal tissues involved in epithelio‐mesenchymal interactions , 2000, Developmental dynamics : an official publication of the American Association of Anatomists.

[14]  M. Westerfield,et al.  The olfactory placodes of the zebrafish form by convergence of cellular fields at the edge of the neural plate. , 2000, Development.

[15]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[16]  Xin Chen,et al.  The TRANSFAC system on gene expression regulation , 2001, Nucleic Acids Res..

[17]  E. Fuchs,et al.  Identification and dissection of an enhancer controlling epithelial gene expression in skin , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Flesch,et al.  On the trail of cardiac specific transcription factors. , 2001, Cardiovascular research.

[19]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[20]  W. Wasserman,et al.  A predictive model for regulatory sequences directing liver-specific transcription. , 2001, Genome research.

[21]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[22]  R. Schwartz,et al.  Combinatorial Expression of GATA4, Nkx2-5, and Serum Response Factor Directs Early Cardiac Gene Activity* , 2002, The Journal of Biological Chemistry.

[23]  B. Sherry The role of interferon regulatory factors in the cardiac response to viral infection. , 2002, Viral immunology.

[24]  Anna G. Nazina,et al.  Homotypic regulatory clusters in Drosophila. , 2003, Genome research.

[25]  M. Nóbrega,et al.  Scanning Human Gene Deserts for Long-Range Enhancers , 2003, Science.

[26]  Martin C. Frith,et al.  Cluster-Buster: finding dense clusters of motifs in DNA sequences , 2003, Nucleic Acids Res..

[27]  B. Oostra,et al.  A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. , 2003, Human molecular genetics.

[28]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Jun S. Liu,et al.  Decoding human regulatory circuits. , 2004, Genome research.

[30]  Roded Sharan,et al.  CREME: Cis-Regulatory Module Explorer for the human genome , 2004, Nucleic Acids Res..

[31]  Katherine E Yutzey,et al.  Calcineurin signaling and NFAT activation in cardiovascular and skeletal muscle development. , 2004, Developmental biology.

[32]  K. Kawakami,et al.  A transposon-mediated gene trap approach identifies developmentally regulated genes in zebrafish. , 2004, Developmental cell.

[33]  D. Ginty,et al.  Restricted inactivation of serum response factor to the cardiovascular system. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[35]  Ivan Ovcharenko,et al.  ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes , 2004, Nucleic Acids Res..

[36]  Wyeth W. Wasserman,et al.  MSCAN: identification of functional clusters of transcription factor binding sites , 2004, Nucleic Acids Res..

[37]  Klaudia Walter,et al.  Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2004, PLoS biology.

[38]  J. Tena,et al.  A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts. , 2005, Genome research.

[39]  L. Zon,et al.  Regulation of the lmo2 promoter during hematopoietic and vascular development in zebrafish. , 2005, Developmental biology.

[40]  Malcolm J. Low,et al.  Identification of Neuronal Enhancers of the Proopiomelanocortin Gene by Transgenic Mouse Analysis and Phylogenetic Footprinting , 2005, Molecular and Cellular Biology.

[41]  Debashis Ghosh,et al.  Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[42]  E. Wingender,et al.  Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm , 2006, Nucleic Acids Res..

[43]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[44]  S. Fisher,et al.  Evaluating the biological relevance of putative enhancers using Tol2 transposon-mediated transgenesis in zebrafish , 2006, Nature Protocols.

[45]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[46]  S. Fisher,et al.  Conservation of RET Regulatory Function from Human to Zebrafish Without Sequence Similarity , 2006, Science.

[47]  Richard Axel,et al.  Interchromosomal Interactions and Olfactory Receptor Choice , 2006, Cell.

[48]  Saurabh Sinha,et al.  Stubb: a program for discovery and analysis of cis-regulatory modules , 2006, Nucleic Acids Res..

[49]  C. Stoeckert,et al.  Defining the mammalian CArGome. , 2005, Genome research.

[50]  Alan M. Moses,et al.  In vivo enhancer analysis of human conserved non-coding sequences , 2006, Nature.

[51]  Ivan Ovcharenko,et al.  Predicting tissue-specific enhancers in the human genome. , 2006, Genome research.

[52]  B. Weinstein,et al.  Combinatorial function of ETS transcription factors in the developing vasculature. , 2007, Developmental biology.

[53]  Alexander J. Hartemink,et al.  A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast , 2007, PLoS Comput. Biol..

[54]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[55]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[56]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[57]  J. Galagan,et al.  Conrad: gene prediction using conditional random fields. , 2007, Genome research.

[58]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[59]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[60]  A. Visel,et al.  Combinatorial Regulation of Endothelial Gene Expression by Ets and Forkhead Transcription Factors , 2008, Cell.

[61]  R. Vinton,et al.  Asymmetrical distribution of non-conserved regulatory sequences at PHOX2B is reflected at the ENCODE loci and illuminates a possible genome-wide trend , 2009, BMC Genomics.

[62]  B. Bruneau The developmental genetics of congenital heart disease , 2008, Nature.

[63]  E. Segal,et al.  Predicting expression patterns from regulatory sequence in Drosophila segmentation , 2008, Nature.

[64]  R. West,et al.  The transcription factor LMO2 is a robust marker of vascular endothelium and vascular neoplasms and selected other entities. , 2009, American journal of clinical pathology.

[65]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[66]  Denis Duboule,et al.  Uncoupling Time and Space in the Collinear Regulation of Hox Genes , 2009, PLoS genetics.

[67]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.