Detection of active transcription factor binding sites with the combination of DNase hypersensitivity and histone modifications

MOTIVATION The identification of active transcriptional regulatory elements is crucial to understand regulatory networks driving cellular processes such as cell development and the onset of diseases. It has recently been shown that chromatin structure information, such as DNase I hypersensitivity (DHS) or histone modifications, significantly improves cell-specific predictions of transcription factor binding sites. However, no method has so far successfully combined both DHS and histone modification data to perform active binding site prediction. RESULTS We propose here a method based on hidden Markov models to integrate DHS and histone modifications occupancy for the detection of open chromatin regions and active binding sites. We have created a framework that includes treatment of genomic signals, model training and genome-wide application. In a comparative analysis, our method obtained a good trade-off between sensitivity versus specificity and superior area under the curve statistics than competing methods. Moreover, our technique does not require further training or sequence information to generate binding location predictions. Therefore, the method can be easily applied on new cell types and allow flexible downstream analysis such as de novo motif finding. AVAILABILITY AND IMPLEMENTATION Our framework is available as part of the Regulatory Genomics Toolbox. The software information and all benchmarking data are available at http://costalab.org/wp/dh-hmm. CONTACT ivan.costa@rwth-aachen.de or eduardo.gusmao@rwth-aachen.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[2]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[3]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[4]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[5]  H. H. Madden Comments on the Savitzky-Golay convolution method for least-squares-fit smoothing and differentiation of digital data , 1976 .

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Jie Wang,et al.  Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium , 2012, Nucleic Acids Res..

[8]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[9]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[10]  Anna Lyubetskaya,et al.  Transcription Factor Binding Site Mapping Using ChIP-Seq. , 2014, Microbiology spectrum.

[11]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[12]  Vsevolod J. Makeev,et al.  Motif discovery and motif finding from genome-mapped DNase footprint data , 2009, Bioinform..

[13]  Ivan G. Costa,et al.  Prediction of Transcription Factor Binding Sites by Integrating DNase Digestion and Histone Modification , 2012, BSB.

[14]  S. Orkin,et al.  An Extended Transcriptional Network for Pluripotency of Embryonic Stem Cells , 2008, Cell.

[15]  W. Wong,et al.  ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells , 2009, Proceedings of the National Academy of Sciences.

[16]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..

[17]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[18]  D. Schübeler,et al.  Determinants and dynamics of genome accessibility , 2011, Nature Reviews Genetics.

[19]  Martin Renqiang Min,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[20]  Nathan C. Sheffield,et al.  Predicting cell-type–specific gene expression from regions of open chromatin , 2012, Genome research.

[21]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[22]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[23]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[24]  Bing Ren,et al.  Discovery and Annotation of Functional Chromatin Signatures in the Human Genome , 2009, PLoS Comput. Biol..

[25]  William Stafford Noble,et al.  Sequence and chromatin determinants of cell-type–specific transcription factor binding , 2012, Genome research.

[26]  Martha L. Bulyk,et al.  UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein–DNA interactions , 2010, Nucleic Acids Res..

[27]  Maxwell W. Libbrecht,et al.  Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements , 2012, Genome research.

[28]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[29]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[30]  Bartek Wilczynski,et al.  Finding evolutionarily conserved cis-regulatory modules with a universal set of motifs , 2009, BMC Bioinformatics.

[31]  T. Bailey,et al.  High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites , 2008, Nucleic acids research.

[32]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[33]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[34]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[35]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.