Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an increasingly common experimental approach to generate genome-wide maps of histone modifications and to dissect the complexity of the epigenome. Here, we propose EpiCSeg: a novel algorithm that combines several histone modification maps for the segmentation and characterization of cell-type specific epigenomic landscapes. By using an accurate probabilistic model for the read counts, EpiCSeg provides a useful annotation for a considerably larger portion of the genome, shows a stronger association with validation data, and yields more consistent predictions across replicate experiments when compared to existing methods.The software is available at http://github.com/lamortenera/epicseg

[1]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[2]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[3]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[4]  J. Harrow,et al.  GENCODE: producing a reference annotation for ENCODE , 2006, Genome Biology.

[5]  J. Reeds,et al.  Compound Multinomial Likelihood Functions are Unimodal: Proof of a Conjecture of I. J. Good , 1977 .

[6]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[7]  B. Turner,et al.  The adjustable nucleosome: an epigenetic signaling module. , 2012, Trends in genetics : TIG.

[8]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[9]  Guillaume J. Filion,et al.  Systematic Protein Location Mapping Reveals Five Principal Chromatin Types in Drosophila Cells , 2010, Cell.

[10]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[11]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[12]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[13]  Bing Ren,et al.  ChromaSig: A Probabilistic Approach to Finding Common Chromatin Signatures in the Human Genome , 2008, PLoS Comput. Biol..

[14]  Martin Vingron,et al.  Inferring nucleosome positions with their histone mark annotation from ChIP data , 2013, Bioinform..

[15]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[16]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[17]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[18]  Timothy J. Durham,et al.  Systematic analysis of chromatin state dynamics in nine human cell types , 2011, Nature.

[19]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[20]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[21]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[22]  Thomas Lengauer,et al.  BLUEPRINT to decode the epigenetic signature written in blood , 2012, Nature Biotechnology.