论文信息 - CNV-guided multi-read allocation for ChIP-seq

CNV-guided multi-read allocation for ChIP-seq

MOTIVATION In chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and other short-read sequencing experiments, a considerable fraction of the short reads align to multiple locations on the reference genome (multi-reads). Inferring the origin of multi-reads is critical for accurately mapping reads to repetitive regions. Current state-of-the-art multi-read allocation algorithms rely on the read counts in the local neighborhood of the alignment locations and ignore the variation in the copy numbers of these regions. Copy-number variation (CNV) can directly affect the read densities and, therefore, bias allocation of multi-reads. RESULTS We propose cnvCSEM (CNV-guided ChIP-Seq by expectation-maximization algorithm), a flexible framework that incorporates CNV in multi-read allocation. cnvCSEM eliminates the CNV bias in multi-read allocation by initializing the read allocation algorithm with CNV-aware initial values. Our data-driven simulations illustrate that cnvCSEM leads to higher read coverage with satisfactory accuracy and lower loss in read-depth recovery (estimation). We evaluate the biological relevance of the cnvCSEM-allocated reads and the resultant peaks with the analysis of several ENCODE ChIP-seq datasets. AVAILABILITY AND IMPLEMENTATION Available at http://www.stat.wisc.edu/∼qizhang/ CONTACT : qizhang@stat.wisc.edu or keles@stat.wisc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Qi Zhang | Sündüz Keles

[1] M. Gerstein,et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework , 2011, Molecular systems biology.

[2] Dongjun Chung. Statistical methods and software for ChIP-Seq data analysis , 2012 .

[3] Xiaohui Xie,et al. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization , 2011, RECOMB.

[4] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5] Dario Strbenac,et al. Copy-number-aware differential analysis of quantitative DNA sequencing data , 2012, Genome research.

[6] Vladimir B. Bajic,et al. HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data , 2013, Bioinform..

[7] Nathan Schneider,et al. Association for Computational Linguistics: Human Language Technologies , 2011 .

[8] Mikael Bodén,et al. MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[9] P. Park,et al. Design and analysis of ChIP-seq experiments for DNA-binding proteins , 2008, Nature Biotechnology.

[10] Marc D. Perry,et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[11] Colin N. Dewey,et al. Discovering Transcription Factor Binding Sites in Highly Repetitive Regions of Genomes with Multi-Read Analysis of ChIP-Seq Data , 2011, PLoS Comput. Biol..