论文信息 - The Poisson Margin Test for Normalization-Free Significance Analysis of NGS Data

The Poisson Margin Test for Normalization-Free Significance Analysis of NGS Data

The current methods for the determination of the statistical significance of peaks and regions in next generation sequencing (NGS) data require an explicit normalization step to compensate for (global or local) imbalances in the sizes of sequenced and mapped libraries. There are no canonical methods for performing such compensations; hence, a number of different procedures serving this goal in different ways can be found in the literature. Unfortunately, the normalization has a significant impact on the final results. Different methods yield very different numbers of detected "significant peaks" even in the simplest scenario of ChIP-Seq experiments that compare the enrichment in a single sample relative to a matching control. This becomes an even more acute issue in the more general case of the comparison of multiple samples, where a number of arbitrary design choices will be required in the data analysis stage, each option resulting in possibly (significantly) different outcomes. In this article, we investigate a principled statistical procedure that eliminates the need for a normalization step. We outline its basic properties, in particular the scaling upon depth of sequencing. For the sake of illustration and comparison, we report the results of re-analyzing a ChIP-Seq experiment for transcription factor binding site detection. In order to quantify the differences between outcomes, we use a novel method based on the accuracy of in silico prediction by support vector machine (SVM) models trained on part of the genome and tested on the remainder. See Kowalczyk et al. ( 2009 ) for supplementary material.

[1] R. Myers,et al. An Integrated Software System for Analyzing Chip-chip and Chip-seq Data (supplementary Information) , 2008 .

[2] Gunnar Rätsch,et al. ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[3] Raymond K. Auerbach,et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls , 2009, Nature Biotechnology.

[4] Chen Zeng,et al. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data , 2009, Bioinform..

[5] A. Mortazavi,et al. Computation for ChIP-seq and RNA-seq studies , 2009, Nature Methods.

[6] David A. Nix,et al. Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks , 2008, BMC Bioinformatics.

[7] Yvan Saeys,et al. Toward a gold standard for promoter prediction evaluation , 2009, Bioinform..

[8] Angel Porgador,et al. Cell type-specific DNA methylation patterns in the human breast , 2008, Proceedings of the National Academy of Sciences.

[9] Allen D. Delaney,et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[10] Li Deng,et al. Differential expression in SAGE: accounting for normal between-library variation , 2003, Bioinform..

[11] Mark D. Robinson,et al. Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..