DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data

Chromatin immunoprecipitation with high‐throughput sequencing (ChIP‐seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next‐generation sequencers offers the technology needed to identify these protein‐binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost‐effective and time‐efficient protocol that is generally applicable to ChIP‐seq analysis; this protocol uses a novel peak‐calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read‐map files. This two‐step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein‐binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics.

[1]  Olivier Elemento,et al.  An integrated ChIP-seq analysis platform with customizable workflows , 2011, BMC Bioinformatics.

[2]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[3]  T. Itoh,et al.  Replisome stability at defective DNA replication forks is independent of S phase checkpoint kinases. , 2012, Molecular cell.

[4]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[5]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[6]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[7]  K. Shirahige,et al.  PRDM14 ensures naive pluripotency through dual regulation of signaling and epigenetic pathways in mouse embryonic stem cells. , 2013, Cell stem cell.

[8]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[9]  Michael O. Finkelstein Combining the Evidence , 2009 .

[10]  O. Gotoh,et al.  A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence , 2008, Nucleic acids research.

[11]  Ryuichiro Nakato,et al.  Origin Association of Sld3, Sld7, and Cdc45 Proteins Is a Key Step for Determination of Origin-Firing Timing , 2011, Current Biology.

[12]  Raymond K. Auerbach,et al.  Mapping accessible chromatin regions using Sono-Seq , 2009, Proceedings of the National Academy of Sciences.

[13]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[14]  Gabriele Gillessen-Kaesbach,et al.  HDAC8 mutations in Cornelia de Lange Syndrome affect the cohesin acetylation cycle , 2012, Nature.

[15]  K. Shirahige,et al.  Importance of Polη for Damage-Induced Cohesion Reveals Differential Regulation of Cohesion Establishment at the Break Site and Genome-Wide , 2013, PLoS genetics.

[16]  K. Nasmyth,et al.  ATP Hydrolysis Is Required for Relocating Cohesin from Sites Occupied by Its Scc2/4 Loading Complex , 2011, Current Biology.

[17]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[18]  P. Bickel,et al.  Systematic evaluation of factors influencing ChIP-seq fidelity , 2012, Nature Methods.

[19]  K. Shirahige,et al.  Telomere-binding protein Taz1 controls global replication timing through its localization near late replication origins in fission yeast. , 2012, Genes & development.

[20]  T. Laajala,et al.  A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments , 2009, BMC Genomics.

[21]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[22]  K. Nasmyth,et al.  A positively charged channel within the Smc1/Smc3 hinge required for sister chromatid cohesion , 2010, The EMBO journal.

[23]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[24]  Colin N. Dewey,et al.  Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data , 2011, PLoS Comput. Biol..

[25]  K. Nasmyth,et al.  Both Interaction Surfaces within Cohesin's Hinge Domain Are Essential for Its Stable Chromosomal Association , 2010, Current Biology.

[26]  T. Itoh,et al.  Chromosome length influences replication-induced topological stress , 2011, Nature.

[27]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[28]  M. Huss,et al.  Q&A: ChIP-seq technologies and the study of gene regulation , 2010, BMC Biology.

[29]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[30]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[31]  Nathaniel D. Heintzman,et al.  Histone modifications at human enhancers reflect global cell-type-specific gene expression , 2009, Nature.

[32]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[33]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[34]  Emmanuel Barillot,et al.  De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis , 2010, Nucleic acids research.

[35]  Feng Lin,et al.  A signal-noise model for significance analysis of ChIP-seq with negative control , 2010, Bioinform..