TELS: A Novel Computational Framework for Identifying Motif Signatures of Transcribed Enhancers

In mammalian cells, transcribed enhancers (TrEns) play important roles in the initiation of gene expression and maintenance of gene expression levels in a spatiotemporal manner. One of the most challenging questions is how the genomic characteristics of enhancers relate to enhancer activities. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers’ DNA code in a more systematic way. To address this problem, we developed a novel computational framework, Transcribed Enhancer Landscape Search (TELS), aimed at identifying predictive cell type/tissue-specific motif signatures of TrEns. As a case study, we used TELS to compile a comprehensive catalog of motif signatures for all known TrEns identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that combinations of different short motifs characterize in an optimized manner cell type/tissue-specific TrEns. Our study is the first to report combinations of motifs that maximize classification performance of TrEns exclusively transcribed in one cell type/tissue from TrEns exclusively transcribed in different cell types/tissues. Moreover, we also report 31 motif signatures predictive of enhancers’ broad activity. TELS codes and material are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

[1]  Wei Wang,et al.  Comparative annotation of functional regions in the human genome using epigenomic data , 2013, Nucleic acids research.

[2]  Gerald Stampfel,et al.  Dissection of thousands of cell type-specific enhancers identifies dinucleotide repeat motifs as general enhancer features , 2014, Genome research.

[3]  Panos Kalnis,et al.  Progress and challenges in bioinformatics approaches for enhancer identification , 2015, Briefings Bioinform..

[4]  Panos Kalnis,et al.  Discriminative identification of transcriptional responses of promoters and enhancers after stimulus , 2016, Nucleic acids research.

[5]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  C. Sander,et al.  Genome-wide analysis of non-coding regulatory mutations in cancer , 2014, Nature Genetics.

[7]  B. Ren,et al.  Transcription: Enhancers make non-coding RNA , 2010, Nature.

[8]  Jordan A. Ramilowski,et al.  An atlas of human long non-coding RNAs with accurate 5′ ends , 2017, Nature.

[9]  Nathaniel D Heintzman,et al.  Finding distal regulatory elements in the human genome. , 2009, Current opinion in genetics & development.

[10]  Mathieu Lupien,et al.  Emergence of the Noncoding Cancer Genome: A Target of Genetic and Epigenetic Alterations. , 2016, Cancer discovery.

[11]  Eran Segal,et al.  A shared architecture for promoters and enhancers , 2014, Nature Genetics.

[12]  Spiridon D. Likothanassis,et al.  YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Michael Fernández,et al.  Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines , 2012, Nucleic acids research.

[14]  Kristel Van Steen,et al.  A roadmap to multifactor dimensionality reduction methods , 2015, Briefings Bioinform..

[15]  V. Bajic,et al.  DEEP: a general computational framework for predicting enhancers , 2014, Nucleic acids research.

[16]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[17]  Vladimir B. Bajic,et al.  Distinct profiling of antimicrobial peptide families , 2014, Bioinform..

[18]  Thomas J. Ha,et al.  Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells , 2015, Science.

[19]  Marcel E Dinger,et al.  Computational Approaches for Functional Prediction and Characterisation of Long Noncoding RNAs. , 2016, Trends in genetics : TIG.

[20]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[21]  Athanasios K. Tsakalidis,et al.  EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms , 2014, Bioinform..

[22]  Shuangge Ma,et al.  A selective review of robust variable selection with applications in bioinformatics , 2015, Briefings Bioinform..

[23]  J. T. Kadonaga,et al.  The RNA polymerase II core promoter: a key component in the regulation of gene expression. , 2002, Genes & development.

[24]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[25]  V. Bajic,et al.  DWFS: A Wrapper Feature Selection Tool Based on a Parallel Genetic Algorithm , 2015, PloS one.

[26]  J. Capra,et al.  Short DNA sequence patterns accurately identify broadly active human enhancers , 2017, BMC Genomics.

[27]  Deqing Hu,et al.  Enhancer malfunction in cancer. , 2014, Molecular cell.

[28]  Yoshihide Hayashizaki,et al.  Enhanced Identification of Transcriptional Enhancers Provides Mechanistic Insights into Diseases. , 2016, Trends in genetics : TIG.

[29]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[30]  Michael A. Beer,et al.  Discriminative prediction of mammalian enhancers from DNA sequence. , 2011, Genome research.

[31]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[32]  A. Dean,et al.  Enhancer function: mechanistic and genome-wide insights come together. , 2014, Molecular cell.

[33]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[34]  Vladimir B. Bajic,et al.  DENdb: database of integrated human enhancers , 2015, Database J. Biol. Databases Curation.

[35]  A. Stark,et al.  Transcriptional enhancers: from properties to genome-wide predictions , 2014, Nature Reviews Genetics.

[36]  R. Young,et al.  Transcription of eukaryotic protein-coding genes. , 2000, Annual review of genetics.

[37]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[38]  B. Cohen,et al.  High-throughput functional testing of ENCODE segmentation predictions , 2014, Genome research.

[39]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.