Differential peak calling of ChIP-seq signals with replicates with THOR

The study of changes in protein–DNA interactions measured by ChIP-seq on dynamic systems, such as cell differentiation, response to treatments or the comparison of healthy and diseased individuals, is still an open challenge. There are few computational methods comparing changes in ChIP-seq signals with replicates. Moreover, none of these previous approaches addresses ChIP-seq specific experimental artefacts arising from studies with biological replicates. We propose THOR, a Hidden Markov Model based approach, to detect differential peaks between pairs of biological conditions with replicates. THOR provides all pre- and post-processing steps required in ChIP-seq analyses. Moreover, we propose a novel normalization approach based on housekeeping genes to deal with cases where replicates have distinct signal-to-noise ratios. To evaluate differential peak calling methods, we delineate a methodology using both biological and simulated data. This includes an evaluation procedure that associates differential peaks with changes in gene expression as well as histone modifications close to these peaks. We evaluate THOR and seven competing methods on data sets with distinct characteristics from in vitro studies with technical replicates to clinical studies of cancer patients. Our evaluation analysis comprises of 13 comparisons between pairs of biological conditions. We show that THOR performs best in all scenarios.

[1]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[2]  N. Friedman,et al.  Chromatin state dynamics during blood formation , 2014, Science.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  J. Ahringer,et al.  Systematic bias in high-throughput sequencing data and its correction by BEADS , 2011, Nucleic acids research.

[5]  Aaron T. L. Lun,et al.  De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly , 2014, Nucleic acids research.

[6]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[7]  Sündüz Keles,et al.  Detecting differential binding of transcription factors with ChIP-seq , 2012, Bioinform..

[8]  N. Ismail,et al.  Handling Overdispersion with Negative Binomial and Generalized Poisson Regression Models , 2007 .

[9]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[10]  Martin Vingron,et al.  Inferring nucleosome positions with their histone mark annotation from ChIP data , 2013, Bioinform..

[11]  Jun S. Song,et al.  Statistical Applications in Genetics and Molecular Biology Normalization , bias correction , and peak calling for ChIP-seq , 2012 .

[12]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[13]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[14]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[15]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[16]  Manolis Kellis,et al.  Interpreting non-coding variation in complex disease genetics , 2012, Nature Biotechnology.

[17]  E. Nestler,et al.  Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens , 2014, Genome Biology.

[18]  Christophe Couvreur,et al.  Hidden Markov Models and Their Mixtures , 1996 .

[19]  H. Szerlong,et al.  Nucleosome distribution and linker DNA: connecting nuclear function to dynamic chromatin structure. , 2011, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[20]  Clifford A. Meyer,et al.  Identifying and mitigating bias in next-generation sequencing methods for chromatin biology , 2014, Nature Reviews Genetics.

[21]  Michael Q. Zhang,et al.  Combinatorial patterns of histone acetylations and methylations in the human genome , 2008, Nature Genetics.

[22]  Martin Vingron,et al.  histoneHMM: Differential analysis of histone modifications with broad genomic footprints , 2015, BMC Bioinformatics.

[23]  A. Mortazavi,et al.  Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[24]  Feng Lin,et al.  An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data , 2008, Bioinform..

[25]  Nir Friedman,et al.  High-resolution nucleosome mapping reveals transcription-dependent promoter packaging. , 2010, Genome research.

[26]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[27]  M. Hristov,et al.  TGF-β1 Accelerates Dendritic Cell Differentiation from Common Dendritic Cell Progenitors and Directs Subset Specification toward Conventional Dendritic Cells , 2010, The Journal of Immunology.

[28]  Benjamin A Garcia,et al.  Analytical tools and current challenges in the modern era of neuroepigenomics , 2014, Nature Neuroscience.

[29]  V. Beneš,et al.  Epigenetic program and transcription factor circuitry of dendritic cell development , 2015, Nucleic acids research.

[30]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[31]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[32]  S. Rose-John,et al.  Dendritic cell development requires histone deacetylase activity , 2014, European journal of immunology.

[33]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[34]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[35]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[36]  Maureen A. Sartor,et al.  PePr: a peak-calling prioritization pipeline to identify consistent or differential peaks from replicated ChIP-Seq data , 2014, Bioinform..

[37]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[38]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[39]  S. Batzoglou,et al.  Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data , 2008, Nature Methods.

[40]  Alexander Schliep,et al.  CLEVER: clique-enumerating variant finder , 2012, Bioinform..

[41]  Chris A. Helliwell,et al.  ChIPseqR: analysis of ChIP-seq experiments , 2010, BMC Bioinformatics.

[42]  E. Nestler,et al.  diffReps: Detecting Differential Chromatin Modification Sites from ChIP-seq Data with Biological Replicates , 2013, PloS one.

[43]  Philippe Collas,et al.  μChIP—a rapid micro chromatin immunoprecipitation assay for small cell samples and biopsies , 2008, Nucleic acids research.

[44]  H. Stunnenberg,et al.  BLUEPRINT: mapping human blood cell epigenomes , 2013, Haematologica.

[45]  I. Ellis,et al.  Differential oestrogen receptor binding is associated with clinical outcome in breast cancer , 2011, Nature.

[46]  R. Xavier,et al.  Epigenetic programming of monocyte-to-macrophage differentiation and trained innate immunity , 2014, Science.

[47]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[48]  Kevin Y. Yip,et al.  Understanding transcriptional regulation by integrative analysis of transcription factor binding data , 2012, Genome research.

[49]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[50]  Mark Gerstein,et al.  Modeling ChIP Sequencing In Silico with Applications , 2008, PLoS Comput. Biol..

[51]  Michel Dumontier,et al.  An evidence-based approach to identify aging-related genes in Caenorhabditis elegans , 2015, BMC Bioinformatics.

[52]  Ivan G. Costa,et al.  Detecting differential peaks in ChIP-seq signals with ODIN , 2015, Bioinform..

[53]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[54]  S. Orkin,et al.  METHOD Open Access , 2014 .

[55]  Francisco de A. T. de Carvalho,et al.  Predicting gene expression in T cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models , 2011, BMC Bioinformatics.

[56]  Uwe Ohler,et al.  JAMM: a peak finder for joint analysis of NGS replicates , 2015, Bioinform..

[57]  M. Facciotti,et al.  Evaluation of Algorithm Performance in ChIP-Seq Peak Detection , 2010, PloS one.

[58]  Tyler B. Hughes,et al.  Enhancer sequence variants and transcription-factor deregulation synergize to construct pathogenic regulatory circuits in B-cell lymphoma. , 2015, Immunity.

[59]  Raymond K. Auerbach,et al.  PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[60]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[61]  Vladimir B. Bajic,et al.  HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data , 2013, Bioinform..

[62]  T. Furey ChIP – seq and beyond : new and improved methodologies to detect and characterize protein – DNA interactions , 2012 .

[63]  Andreas S. Richter,et al.  Standardizing chromatin research: a simple and universal method for ChIP-seq , 2015, Nucleic acids research.

[64]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.