An effective differential expression analysis of deep-sequencing data based on the Poisson log-normal model

Tremendous amount of deep-sequencing data has unprecedentedly improved our understanding in biomedical science by digital sequence reads. To mine useful information from such data, a proper distribution for modeling all range of the count data and accurate parameter estimation are required. In this paper, we propose a method, called "DEPln," for differential expression analysis based on the Poisson log-normal (PLN) distribution with an accurate parameter estimation strategy, which aims to overcome the inconvenience in the mathematical analysis of the traditional PLN distribution. The performance of our proposed method is validated by both synthetic and real data. Experimental results indicate that our method outperforms the traditional methods in terms of the discrimination ability and results in a good tradeoff between the recall rate and the precision. Thus, our work provides a new approach for gene expression analysis and has strong potential in deep-sequencing based research.

[1]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[2]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[3]  Daniel Bottomly,et al.  Evaluating Gene Expression in C57BL/6J and DBA/2J Mouse Striatum Using RNA-Seq and Microarrays , 2011, PloS one.

[4]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[5]  Alyssa C. Frazee,et al.  ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets , 2011, BMC Bioinformatics.

[6]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[7]  Aeilko H. Zwinderman,et al.  Modeling Sage data with a truncated gamma-Poisson model , 2006, BMC Bioinformatics.

[8]  M. Marra,et al.  Applications of next-generation sequencing technologies in functional genomics. , 2008, Genomics.

[9]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[10]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[11]  S. Luo,et al.  mRNA-seq with agnostic splice site discovery for nervous system transcriptomics tested in chronic pain. , 2010, Genome research.

[12]  Andrea Rau,et al.  A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data , 2013, PloS one.

[13]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[14]  Martin S. Taylor,et al.  The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line , 2009, Nature Genetics.

[15]  Piotr J. Balwierz,et al.  Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data , 2009, Genome Biology.

[16]  M. Bulmer On Fitting the Poisson Lognormal Distribution to Species-Abundance Data , 1974 .

[17]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[18]  Steinar Engen,et al.  Analyzing Spatial Structure of Communities Using the Two‐Dimensional Poisson Lognormal Species Abundance Model , 2002, The American Naturalist.

[19]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[20]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[21]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.