A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data

Motivation: A global map of transcription factor binding sites (TFBSs) is critical to understanding gene regulation and genome function. DNaseI digestion of chromatin coupled with massively parallel sequencing (digital genomic footprinting) enables the identification of protein-binding footprints with high resolution on a genome-wide scale. However, accurately inferring the locations of these footprints remains a challenging computational problem. Results: We present a dynamic Bayesian network-based approach for the identification and assignment of statistical confidence estimates to protein-binding footprints from digital genomic footprinting data. The method, DBFP, allows footprints to be identified in a probabilistic framework and outperforms our previously described algorithm in terms of precision at a fixed recall. Applied to a digital footprinting data set from Saccharomyces cerevisiae, DBFP identifies 4679 statistically significant footprints within intergenic regions. These footprints are mainly located near transcription start sites and are strongly enriched for known TFBSs. Footprints containing no known motif are preferentially located proximal to other footprints, consistent with cooperative binding of these footprints. DBFP also identifies a set of statistically significant footprints in the yeast coding regions. Many of these footprints coincide with the boundaries of antisense transcripts, and the most significant footprints are enriched for binding sites of the chromatin-associated factors Abf1 and Rap1. Contact: jay.hesselberth@ucdenver.edu; william-noble@u.washington.edu Supplementary information: Supplementary material is available at Bioinformatics online.

[1]  Carl Wu The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I , 1980, Nature.

[2]  Alexander D. Johnson,et al.  Yeast repressor alpha 2 binds to its operator cooperatively with yeast protein Mcm1 , 1989, Molecular and cellular biology.

[3]  K. Clark,et al.  The yeast transcription activator PRTF, a homolog of the mammalian serum response factor, is encoded by the MCM1 gene. , 1989, Genes & development.

[4]  S. Fields,et al.  The yeast STE12 protein binds to the DNA sequence mediating pheromone induction. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[5]  C. Devlin,et al.  RAP1 is required for BAS1/BAS2- and GCN4-dependent transcription of the yeast HIS4 gene , 1991, Molecular and cellular biology.

[6]  R Staden Staden: searching for motifs in nucleic acid sequences. , 1994, Methods in molecular biology.

[7]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[8]  W. H. Mager,et al.  Different roles for abf1p and a T-rich promoter element in nucleosome organization of the yeast RPS28A gene. , 2000, Nucleic acids research.

[9]  A. Vershon,et al.  Interactions of the Mcm1 MADS Box Protein with Cofactors That Regulate Mating in Yeast , 2002, Molecular and Cellular Biology.

[10]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Stamatoyannopoulos,et al.  High-throughput localization of functional elements by quantitative chromatin profiling , 2004, Nature Methods.

[13]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[14]  T. Miyake,et al.  Comparison of ABF1 and RAP1 in Chromatin Opening and Transactivator Potentiation in the Budding Yeast Saccharomyces cerevisiae , 2004, Molecular and Cellular Biology.

[15]  D. Koller,et al.  Sfp1 is a stress- and nutrient-sensitive regulator of ribosomal protein gene expression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[17]  J.A. Bilmes,et al.  Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.

[18]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[19]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[20]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[21]  Xin-Qiu Yao,et al.  A dynamic Bayesian network approach to protein secondary structure prediction , 2008, BMC Bioinformatics.

[22]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[23]  Hedi Peterson,et al.  g:Profiler—a web-based toolset for functional profiling of gene lists from large-scale experiments , 2007, Nucleic Acids Res..

[24]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[25]  L. Steinmetz,et al.  Antisense artifacts in transcriptome microarray experiments are resolved by actinomycin D , 2007, Nucleic acids research.

[26]  Bryan J Venters,et al.  A barrier nucleosome model for statistical positioning of nucleosomes throughout the yeast genome. , 2008, Genome research.

[27]  Jeff A. Bilmes,et al.  Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks , 2008, PLoS Comput. Biol..

[28]  R. Shamir,et al.  Transcription factor and microRNA motif discovery: the Amadeus platform and a compendium of metazoan target sets. , 2008, Genome research.

[29]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[30]  Jeff A. Bilmes,et al.  Modeling peptide fragmentation with dynamic Bayesian networks for peptide identification , 2008, ISMB.

[31]  D. Tollervey,et al.  A ncRNA modulates histone modification and mRNA induction in the yeast GAL gene cluster. , 2008, Molecular cell.

[32]  Daniel E. Newburger,et al.  High-resolution DNA-binding specificity analysis of yeast transcription factors. , 2009, Genome research.

[33]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.