论文信息 - Robust adjustment of sequence tag abundance

Robust adjustment of sequence tag abundance

MOTIVATION The majority of next-generation sequencing technologies effectively sample small amounts of DNA or RNA that are amplified (i.e. copied) before sequencing. The amplification process is not perfect, leading to extreme bias in sequenced read counts. We present a novel procedure to account for amplification bias and demonstrate its effectiveness in mitigating gene length dependence when estimating true gene expression. RESULTS We tested the proposed method on simulated and real data. Simulations indicated that our method captures true gene expression more effectively than classic censoring-based approaches and leads to power gains in differential expression testing, particularly for shorter genes with high transcription rates. We applied our method to an unreplicated Arabidopsis RNA-seq dataset resulting in disparate gene ontologies arising from gene set enrichment analyses. AVAILABILITY AND IMPLEMENTATION R code to perform the RASTA procedures is freely available on the web at www.stat.purdue.edu/∼doerge/.

Rebecca W. Doerge | Douglas D. Baumann

[1] E. Mardis. Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[2] T. Yee. The VGAM Package for Categorical Data Analysis , 2010 .

[3] M. Stephens,et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[4] T. Sørensen,et al. A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[5] James R. Knight,et al. Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[6] Tanya Z. Berardini,et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation , 2007, Nucleic Acids Res..

[7] Y. Benjamini,et al. THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[8] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9] R. Lister,et al. Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[10] G. Lynch,et al. The Control of the False Discovery Rate in Fixed Sequence Multiple Testing , 2016, 1611.03146.

[11] W. Cleveland. Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .