ChIPdig: a comprehensive user-friendly tool for mining multi-sample ChIP-seq data

In recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data. Here we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, and 5' and 3' untranslated regions), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example. ChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig .

[1]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[2]  Sündüz Keles,et al.  Detecting differential binding of transcription factors with ChIP-seq , 2012, Bioinform..

[3]  Florian Hahne,et al.  QuasR: quantification and annotation of short reads in R , 2015, Bioinform..

[4]  B. Langmead,et al.  Aligning Short Sequencing Reads with Bowtie , 2010, Current protocols in bioinformatics.

[5]  Martial Sankar,et al.  POLYPHEMUS: R package for comparative analysis of RNA polymerase II ChIP-seq profiles by non-linear normalization , 2012, Nucleic acids research.

[6]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[7]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[8]  Aaron T. L. Lun,et al.  csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows , 2015, Nucleic acids research.

[9]  Qing-Yu He,et al.  ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization , 2015, Bioinform..

[10]  S. Orkin,et al.  METHOD Open Access , 2014 .

[11]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[12]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[13]  A. Rechtsteiner,et al.  Broad chromosomal domains of histone modification patterns in C. elegans. , 2011, Genome research.

[14]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[15]  Clifford A. Meyer,et al.  Identifying and mitigating bias in next-generation sequencing methods for chromatin biology , 2014, Nature Reviews Genetics.

[16]  Yong Zhang,et al.  Identifying ChIP-seq enrichment using MACS , 2012, Nature Protocols.

[17]  J. Michael Cherry,et al.  The Encyclopedia of DNA elements (ENCODE): data portal update , 2017, Nucleic Acids Res..

[18]  J. Hesselberth,et al.  valr: Reproducible genome interval analysis in R , 2017, F1000Research.

[19]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[20]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[21]  Simon Tavaré,et al.  BayesPeak—an R package for analysing ChIP-seq data , 2011, Bioinform..

[22]  Marco-Antonio Mendoza-Parra,et al.  Characterising ChIP-seq binding patterns by model-based peak shape deconvolution , 2013, BMC Genomics.

[23]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..