Prediction of Transcription Factor Binding Sites by Integrating DNase Digestion and Histone Modification

The identification of cis-acting elements on DNA is crucial for the understanding of the complex regulatory networks that govern many cell mechanisms. However, this task is very complex since it is estimated that there are 1500 different transcription factors (TFs) in the human genome, each of which can bind to multiple loci directly or indirectly. The standard computational approach is the use of a position weight matrix (PWM) to represent the binding preference of a transcription factor and the use of statistical procedures to detect genomic regions with high binding scores. Given the small and degenerate signals of most PWMs, such approach suffers from a very high number of false positive hits. Current research has proven that genome wide assays reflecting open chromatin, such as DNase digestion or histone modifications, can improve sequence based detection of the binding location of transcription factors that are active in a particular cell type. We propose here a Multivariate Hidden Markov Model that is able to improve the prediction of transcription factor binding locations by integrating DNase digestion and histone modification data. Our methodology improves sensitivity, in comparison to existing methods, with little or no effect at specificity rates. This study shows that it is possible to improve predictability power of cis-acting elements by correctly integrating DNase and histone modification data, allowing for more sophisticated studies using a larger set of epigenetic signals.

[1]  Martha L. Bulyk,et al.  UniPROBE: an online database of protein binding microarray data on protein–DNA interactions , 2008, Nucleic Acids Res..

[2]  Bing Ren,et al.  Discovery and Annotation of Functional Chromatin Signatures in the Human Genome , 2009, PLoS Comput. Biol..

[3]  M Angers,et al.  Structural and functional characterization of the human FMR1 promoter reveals similarities with the hnRNP-A2 promoter region. , 1997, Human molecular genetics.

[4]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[5]  T. Wolfsberg,et al.  DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays , 2006, Nature Methods.

[6]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[7]  Dustin E. Schones,et al.  High-Resolution Profiling of Histone Methylations in the Human Genome , 2007, Cell.

[8]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[9]  William Stafford Noble,et al.  Epigenetic priors for identifying active transcription factor binding sites , 2012, Bioinform..

[10]  E. Birney,et al.  High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. , 2011, Genome research.

[11]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[12]  Michael Q. Zhang,et al.  Analysis of the Vertebrate Insulator Protein CTCF-Binding Sites in the Human Genome , 2007, Cell.

[13]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[14]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[15]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[16]  Panayiotis V. Benos,et al.  STAMP: a web tool for exploring DNA-binding motif similarities , 2007, Nucleic Acids Res..

[17]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[18]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[19]  B. Ren,et al.  Genome-wide prediction of transcription factor binding sites using an integrated model , 2010, Genome Biology.

[20]  Ole Winther,et al.  JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update , 2007, Nucleic Acids Res..

[21]  Z. Weng,et al.  High-Resolution Mapping and Characterization of Open Chromatin across the Genome , 2008, Cell.

[22]  D. S. Gross,et al.  Nuclease hypersensitive sites in chromatin. , 1988, Annual review of biochemistry.

[23]  Alice Young,et al.  Identifying gene regulatory elements by genome-wide recovery of DNase hypersensitive sites. , 2004, Proceedings of the National Academy of Sciences of the United States of America.