EAGLE: An algorithm that utilizes a small number of genomic features to predict tissue/cell type-specific enhancer-gene interactions

Long-range regulation by distal enhancers is crucial for many biological processes. The existing methods for enhancer-target gene prediction often require many genomic features. This makes them difficult to be applied to many cell types, in which the relevant datasets are not always available. Here, we design a tool EAGLE, an enhancer and gene learning ensemble method for identification of Enhancer-Gene (EG) interactions. Unlike existing tools, EAGLE used only six features derived from the genomic features of enhancers and gene expression datasets. Cross-validation revealed that EAGLE outperformed other existing methods. Enrichment analyses on special transcriptional factors, epigenetic modifications, and eQTLs demonstrated that EAGLE could distinguish the interacting pairs from non- interacting ones. Finally, EAGLE was applied to mouse and human genomes and identified 7,680,203 and 7,437,255 EG interactions involving 31,375 and 43,724 genes, 138,547 and 177,062 enhancers across 89 and 110 tissue/cell types in mouse and human, respectively. The obtained interactions are accessible through an interactive database enhanceratlas.org. The EAGLE method is available at https://github.com/EvansGao/EAGLE and the predicted datasets are available in http://www.enhanceratlas.org/. Author summary Enhancers are DNA sequences that interact with promoters and activate target genes. Since enhancers often located far from the target genes and the nearest genes are not always the targets of the enhancers, the prediction of enhancer-target gene relationships is a big challenge. Although a few computational tools are designed for the prediction of enhancer-target genes, it’s difficult to apply them in most tissue/cell types due to a lack of enough genomic datasets. Here we proposed a new method, EAGLE, which utilizes a small number of genomic features to predict tissue/cell type-specific enhancer-gene interactions. Comparing with other existing tools, EAGLE displayed a better performance in the 10-fold cross-validation and cross-sample test. Moreover, the predictions by EAGLE were validated by other independent evidence such as the enrichment of relevant transcriptional factors, epigenetic modifications, and eQTLs. Finally, we integrated the enhancer-target relationships obtained from human and mouse genomes into an interactive database EnhancerAtlas, http://www.enhanceratlas.org/.

[1]  Bing He,et al.  EnhancerAtlas: a resource for enhancer annotation and analysis in 105 human cell/tissue types , 2016, Bioinform..

[2]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[3]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[4]  Alireza F. Siahpirani,et al.  A predictive modeling approach for cell line-specific long-range regulatory interactions , 2015, Nucleic acids research.

[5]  R. Young,et al.  Histone H3K27ac separates active from poised enhancers and predicts developmental state , 2010, Proceedings of the National Academy of Sciences.

[6]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[7]  K. Pollard,et al.  Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.

[8]  M. Lupien,et al.  Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits , 2014, Genome research.

[9]  Stephanie L. Hyland,et al.  Identification of active transcriptional regulatory elements with GRO-seq , 2015, Nature Methods.

[10]  K. Tan,et al.  Global view of enhancer–promoter interactome in human cells , 2014, Proceedings of the National Academy of Sciences.

[11]  R. Young,et al.  Super-Enhancers in the Control of Cell Identity and Disease , 2013, Cell.

[12]  Gil Ron,et al.  Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains , 2017, Nature Communications.

[13]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[14]  J. Wysocka,et al.  Ever-Changing Landscapes: Transcriptional Enhancers in Development and Evolution , 2016, Cell.

[15]  J. Dekker,et al.  The long-range interaction landscape of gene promoters , 2012, Nature.

[16]  B. L,et al.  The accessible chromatin landscape of the human genome , 2016 .

[17]  W. Sung,et al.  Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations , 2013, Nature.

[18]  Kevin Y. Yip,et al.  Reconstruction of enhancer–target networks in 935 samples of human primary cells, tissues and cell lines , 2017, Nature Genetics.

[19]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[20]  Neva C. Durand,et al.  A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping , 2014, Cell.

[21]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[22]  Y. Hurd,et al.  An atlas of chromatin accessibility in the adult human brain , 2018, Genome research.

[23]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[24]  E. Liu,et al.  An Oestrogen Receptor α-bound Human Chromatin Interactome , 2009, Nature.

[25]  D. Odom,et al.  CTCF and Cohesin: Linking Gene Regulatory Elements with Their Targets , 2013, Cell.

[26]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[27]  Wenbin Ma,et al.  CCSI: a database providing chromatin–chromatin spatial interaction information , 2016, Database J. Biol. Databases Curation.

[28]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.