Identifying peaks in *-seq data using shape information

BackgroundPeak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol.ResultsWe illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data.ConclusionsThe results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use.

[1]  Christopher J. Mitchell,et al.  Prediction of Gene Activity in Early B Cell Development Based on an Integrative Multi-Omics Analysis , 2014, Journal of proteomics & bioinformatics.

[2]  Tao Liu,et al.  Using MACS to Identify Peaks from ChIP‐Seq Data , 2011, Current protocols in bioinformatics.

[3]  Marco-Antonio Mendoza-Parra,et al.  Characterising ChIP-seq binding patterns by model-based peak shape deconvolution , 2013, BMC Genomics.

[4]  P. Park,et al.  Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[5]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[6]  Yong Zhang,et al.  Identifying ChIP-seq enrichment using MACS , 2012, Nature Protocols.

[7]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[8]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[9]  Thomas A. Down,et al.  A Comparison of Peak Callers Used for DNase-Seq Data , 2014, bioRxiv.

[10]  B. Wold,et al.  Large-Scale Quality Analysis of Published ChIP-seq Data , 2013, G3: Genes, Genomes, Genetics.

[11]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[12]  Clifford A. Meyer,et al.  Identifying and mitigating bias in next-generation sequencing methods for chromatin biology , 2014, Nature Reviews Genetics.

[13]  V. Iyer,et al.  FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. , 2007, Genome research.

[14]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[15]  Y. Kluger,et al.  Picking ChIP-seq peak detectors for analyzing chromatin modification experiments , 2012, Nucleic acids research.

[16]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[17]  Zhaolei Zhang,et al.  SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles , 2015, Bioinform..

[18]  Finn Drabløs,et al.  A manually curated ChIP-seq benchmark demonstrates room for improvement in current peak-finder programs , 2010, Nucleic acids research.

[19]  Finn Drabløs,et al.  The Triform algorithm: improved sensitivity and specificity in ChIP-Seq peak finding , 2012, BMC Bioinformatics.

[20]  Steven J. M. Jones,et al.  FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology , 2008, Bioinform..

[21]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[22]  Kelly P. Stanton,et al.  Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures , 2013, Nucleic acids research.

[23]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[24]  Lior Pachter,et al.  Shape-based peak identification for ChIP-Seq , 2010, BMC Bioinformatics.

[25]  Terrence S. Furey,et al.  F-Seq: a feature density estimator for high-throughput sequence tags , 2008, Bioinform..

[26]  William Stafford Noble,et al.  Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays , 2006, Nature Methods.

[27]  B. Pugh,et al.  Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution , 2011, Cell.

[28]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[29]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[30]  Peter Bajorski,et al.  Wiley Series in Probability and Statistics , 2010 .

[31]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[32]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[33]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[34]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[35]  Bing Li,et al.  The Role of Chromatin during Transcription , 2007, Cell.

[36]  J. Ibrahim,et al.  ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions , 2011, Genome Biology.

[37]  H. Ng,et al.  Uniform, optimal signal processing of mapped deep-sequencing data , 2013, Nature Biotechnology.

[38]  Jie Zhang,et al.  Peak detection on ChIP-Seq data using wavelet transformation , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[39]  Victor X. Jin,et al.  W-ChIPeaks: a comprehensive web application tool for processing ChIP-chip and ChIP-seq data , 2010, Bioinform..

[40]  Steven P. Gygi,et al.  Association of the Histone Methyltransferase Set2 with RNA Polymerase II Plays a Role in Transcription Elongation* , 2002, The Journal of Biological Chemistry.

[41]  Cizhong Jiang,et al.  Nucleosome positioning and gene regulation: advances through genomics , 2009, Nature Reviews Genetics.

[42]  Andrew D. Smith,et al.  Bioinformatics Applications Note Gene Expression Identifying Dispersed Epigenomic Domains from Chip-seq Data , 2022 .

[43]  Feng Lin,et al.  A signal-noise model for significance analysis of ChIP-seq with negative control , 2010, Bioinform..

[44]  Chen Zeng,et al.  A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..