Progress and challenges in bioinformatics approaches for enhancer identification

Enhancers are cis-acting DNA elements that play critical roles in distal regulation of gene expression. Identifying enhancers is an important step for understanding distinct gene expression programs that may reflect normal and pathogenic cellular conditions. Experimental identification of enhancers is constrained by the set of conditions used in the experiment. This requires multiple experiments to identify enhancers, as they can be active under specific cellular conditions but not in different cell types/tissues or cellular states. This has opened prospects for computational prediction methods that can be used for high-throughput identification of putative enhancers to complement experimental approaches. Potential functions and properties of predicted enhancers have been catalogued and summarized in several enhancer-oriented databases. Because the current methods for the computational prediction of enhancers produce significantly different enhancer predictions, it will be beneficial for the research community to have an overview of the strategies and solutions developed in this field. In this review, we focus on the identification and analysis of enhancers by bioinformatics approaches. First, we describe a general framework for computational identification of enhancers, present relevant data types and discuss possible computational solutions. Next, we cover over 30 existing computational enhancer identification methods that were developed since 2000. Our review highlights advantages, limitations and potentials, while suggesting pragmatic guidelines for development of more efficient computational enhancer prediction methods. Finally, we discuss challenges and open problems of this topic, which require further consideration.

[1]  J. Banerji,et al.  Expression of a β-globin gene is enhanced by remote SV40 DNA sequences , 1981, Cell.

[2]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[3]  A. Dean,et al.  Enhancer function: mechanistic and genome-wide insights come together. , 2014, Molecular cell.

[4]  Michael Fernández,et al.  Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines , 2012, Nucleic acids research.

[5]  C. Glass,et al.  The selection and function of cell type-specific enhancers , 2015, Nature Reviews Molecular Cell Biology.

[6]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[7]  Li Teng,et al.  4DGenome: a comprehensive database of chromatin interactions , 2015, Bioinform..

[8]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[9]  Edmund J. Crampin,et al.  Predictive modelling of gene expression from transcriptional regulatory elements , 2015, Briefings Bioinform..

[10]  J. Banerji,et al.  Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. , 1981, Cell.

[11]  David A. Orlando,et al.  Selective Inhibition of Tumor Oncogenes by Disruption of Super-Enhancers , 2013, Cell.

[12]  Kevin Y. Yip,et al.  Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors , 2012, Genome Biology.

[13]  Łukasz M. Boryń,et al.  Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq , 2013, Science.

[14]  B. Cohen,et al.  High-throughput functional testing of ENCODE segmentation predictions , 2014, Genome research.

[15]  M. Eisen,et al.  Identifying Cis-Regulatory Sequences by Word Profile Similarity , 2009, PloS one.

[16]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[17]  Alexander Stark,et al.  Comparative Genomics of Gene Regulation—conservation and Divergence of Cis-regulatory Information This Review Comes from a Themed Issue on Genomes and Evolution Edited Main Text Conflict of Interest , 2022 .

[18]  Zheng Rong Yang,et al.  Biological applications of support vector machines , 2004, Briefings Bioinform..

[19]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[20]  Yiming Lu,et al.  DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications , 2015, PloS one.

[21]  Kai Tan,et al.  Discover regulatory DNA elements using chromatin signatures and artificial neural network , 2010, Bioinform..

[22]  Dongwon Lee,et al.  kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets , 2013, Nucleic Acids Res..

[23]  Mathieu Blanchette,et al.  PReMod: a database of genome-wide mammalian cis-regulatory module predictions , 2006, Nucleic Acids Res..

[24]  Manolis Kellis,et al.  Large-scale epigenome imputation improves data quality and disease variant enrichment , 2015, Nature Biotechnology.

[25]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[26]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[27]  Ho-Ryun Chung,et al.  Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome , 2015, Genome Biology.

[28]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[29]  Christophe Lemetre,et al.  An introduction to artificial neural networks in bioinformatics - application to complex microarray and mass spectrometry datasets in cancer studies , 2008, Briefings Bioinform..

[30]  I. Talianidis,et al.  Dynamics of enhancer-promoter communication during differentiation-induced gene activation. , 2002, Molecular cell.

[31]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[32]  Martha L. Bulyk,et al.  Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos , 2013, Nature Methods.

[33]  Axel Visel,et al.  Enhancer identification through comparative genomics. , 2006, Seminars in cell & developmental biology.

[34]  John M Westlund,et al.  Genome-wide discovery of human heart enhancers. , 2010, Genome research.

[35]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[36]  B. Cohen,et al.  Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants , 2013, Genome research.

[37]  Nathaniel D Heintzman,et al.  Finding distal regulatory elements in the human genome. , 2009, Current opinion in genetics & development.

[38]  J. Wysocka,et al.  Modification of enhancer chromatin: what, how, and why? , 2013, Molecular cell.

[39]  Joseph B Hiatt,et al.  Massively parallel functional dissection of mammalian enhancers in vivo , 2012, Nature Biotechnology.

[40]  Morteza Mohammad Noori,et al.  Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features , 2014, PLoS Comput. Biol..

[41]  Wei Xie,et al.  RFECS: A Random-Forest Based Algorithm for Enhancer Identification from Chromatin State , 2013, PLoS Comput. Biol..

[42]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[43]  Eran Segal,et al.  A shared architecture for promoters and enhancers , 2014, Nature Genetics.

[44]  Edwin Smith,et al.  Enhancer biology and enhanceropathies , 2014, Nature Structural &Molecular Biology.

[45]  Ryan A. Flynn,et al.  A unique chromatin signature uncovers early developmental enhancers in humans , 2011, Nature.

[46]  Richard Bonneau,et al.  FIREWACh: High-throughput Functional Detection of Transcriptional Regulatory Modules in Mammalian Cells , 2014, Nature Methods.

[47]  Saurabh Sinha,et al.  A probabilistic method to detect regulatory modules , 2003, ISMB.

[48]  Ariel S. Schwartz,et al.  An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man , 2010, Cell.

[49]  Thomas J. Ha,et al.  Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells , 2015, Science.

[50]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[51]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[52]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[53]  T. Mikkelsen,et al.  Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. , 2013, Genome research.

[54]  H. Jäckle,et al.  A Histone Mutant Reproduces the Phenotype Caused by Loss of Histone-Modifying Factor Polycomb , 2013, Science.

[55]  M. Groudine,et al.  Enhancers: the abundance and function of regulatory sequences beyond promoters. , 2010, Developmental biology.

[56]  J. Bähler Faculty Opinions recommendation of Genome-wide protein-DNA binding dynamics suggest a molecular clutch for transcription factor function. , 2012 .

[57]  M. Lupien,et al.  Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits , 2014, Genome research.

[58]  Dustin E. Schones,et al.  Genome-wide Mapping of HATs and HDACs Reveals Distinct Functions in Active and Inactive Genes , 2009, Cell.

[59]  Vladimir B. Bajic,et al.  DENdb: database of integrated human enhancers , 2015, Database J. Biol. Databases Curation.

[60]  Timothy J. Durham,et al.  Combinatorial Patterning of Chromatin Regulators Uncovered by Genome-wide Location Analysis in Human Cells , 2011, Cell.

[61]  Vladimir B. Bajic,et al.  HOCOMOCO: a comprehensive collection of human transcription factor binding sites models , 2012, Nucleic Acids Res..

[62]  Finn Drabløs,et al.  Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements , 2011, BMC Biology.

[63]  A. Stark,et al.  Transcriptional enhancers: from properties to genome-wide predictions , 2014, Nature Reviews Genetics.

[64]  Vladimir B. Bajic,et al.  Comparing the Success of Different Prediction Software in Sequence Analysis: A Review , 2000, Briefings Bioinform..

[65]  Hiroki R Ueda,et al.  Analysis and synthesis of high-amplitude Cis-elements in the mammalian circadian clock , 2008, Proceedings of the National Academy of Sciences.

[66]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[67]  K. Tan,et al.  Global view of enhancer–promoter interactome in human cells , 2014, Proceedings of the National Academy of Sciences.

[68]  V. Bajic,et al.  DEEP: a general computational framework for predicting enhancers , 2014, Nucleic acids research.

[69]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[70]  M. Dawson,et al.  Cancer Epigenetics: From Mechanism to Therapy , 2012, Cell.

[71]  Shuangge Ma,et al.  A selective review of robust variable selection with applications in bioinformatics , 2015, Briefings Bioinform..

[72]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[73]  Gerald Stampfel,et al.  Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features , 2014, Genome research.

[74]  James J. Chen,et al.  Class-imbalanced classifiers for high-dimensional data , 2013, Briefings Bioinform..

[75]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[76]  André L. Martins,et al.  Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers , 2014, Nature Genetics.

[77]  Alexander van Oudenaarden,et al.  Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins , 2013, Proceedings of the National Academy of Sciences.

[78]  Nathaniel D. Heintzman,et al.  Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome , 2007, Nature Genetics.

[79]  A. Visel,et al.  Genomic Views of Distant-Acting Enhancers , 2009, Nature.

[80]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[81]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[82]  K. Tan,et al.  Combinatorial chromatin modification patterns in the human genome revealed by subspace clustering , 2011, Nucleic acids research.

[83]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[84]  Jason Piper,et al.  Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data , 2013, Nucleic acids research.

[85]  Li Teng,et al.  4DGenome: a comprehensive database of chromatin interactions , 2015, Bioinform..

[86]  E. Ukkonen,et al.  Genome-wide Prediction of Mammalian Enhancers Based on Analysis of Transcription-Factor Binding Affinity , 2006, Cell.

[87]  Bing Ren,et al.  ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome , 2008, PLoS Comput. Biol..

[88]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[89]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[90]  Wei Wang,et al.  Comparative annotation of functional regions in the human genome using epigenomic data , 2013, Nucleic acids research.

[91]  Manolis Kellis,et al.  Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types , 2013, Genome research.

[92]  Philip Campbell,et al.  Presenting ENCODE , 2012, Nature.

[93]  Bing Ren,et al.  Prediction of regulatory elements in mammalian genomes using chromatin signatures , 2008, BMC Bioinformatics.

[94]  Kevin Y. Yip,et al.  Machine learning and genome annotation: a match meant to be? , 2013, Genome Biology.

[95]  Inna Dubchak,et al.  VISTA Enhancer Browser—a database of tissue-specific human enhancers , 2006, Nucleic Acids Res..

[96]  Katherine S. Pollard,et al.  Integrating Diverse Datasets Improves Developmental Enhancer Prediction , 2013, PLoS Comput. Biol..

[97]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[98]  Deqing Hu,et al.  Enhancer malfunction in cancer. , 2014, Molecular cell.

[99]  Z. Yakhini,et al.  Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters , 2012, Nature Biotechnology.

[100]  G. Kreiman,et al.  Widespread transcription at neuronal activity-regulated enhancers , 2010, Nature.

[101]  J. Dekker,et al.  Chromosome Conformation Capture Carbon Copy Technology , 2007, Current protocols in molecular biology.

[102]  Kristel Van Steen,et al.  A roadmap to multifactor dimensionality reduction methods , 2015, Briefings Bioinform..

[103]  Thomas A. Down,et al.  A Comparison of Peak Callers Used for DNase-Seq Data , 2014, bioRxiv.