Estimating the proportion of true null hypotheses when the statistics are discrete

MOTIVATION In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. RESULTS This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. AVAILABILITY AND IMPLEMENTATION implemented in R.

[1]  M. Stephens,et al.  Sex-specific and lineage-specific alternative splicing in primates. , 2010, Genome research.

[2]  F. Eicker Limit Theorems for Regressions with Unequal and Dependent Errors , 1967 .

[3]  M. A. Black,et al.  A note on the adaptive control of false discovery rates , 2004 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  Cheng Cheng,et al.  Improving false discovery rate estimation , 2004, Bioinform..

[6]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[7]  전치혁,et al.  Positive false discovery rate를 활용한 새로운 군집 분석 , 2010 .

[8]  Hong-Qiang Wang,et al.  SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures , 2011, Bioinform..

[9]  Dorian J. Garrick,et al.  longissimus muscle of Angus cattle Genome-wide association study of concentrations of iron and other minerals in , 2013 .

[10]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[11]  Dan Nettleton,et al.  Estimation of False Discovery Rate Using Sequential Permutation p‐Values , 2013, Biometrics.

[12]  Dan Nettleton,et al.  Estimating the number of true null hypotheses from a histogram of p values , 2006 .

[13]  Hong Ma,et al.  Genome-wide expression profiling and identification of gene activities during early flower development in Arabidopsis , 2005, Plant Molecular Biology.

[14]  Isaac Dialsingh,et al.  False Discovery Rates when the Statistics are Discrete , 2011 .

[15]  Shu-Dong Zhang,et al.  Towards Accurate Estimation of the Proportion of True Null Hypotheses in Multiple Testing , 2011, PloS one.

[16]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[17]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[18]  R. Tarone,et al.  A modified Bonferroni method for discrete data. , 1990, Biometrics.

[19]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[20]  Yinglei Lai,et al.  A censored beta mixture model for the estimation of the proportion of non-differentially expressed genes , 2010, Bioinform..

[21]  Cheng Cheng,et al.  Robust estimation of the false discovery rate , 2006, Bioinform..

[22]  Alison L. Van Eenennaam,et al.  Genome-wide association study of concentrations of iron and other minerals in longissimus muscle of Angus cattle. , 2013, Journal of animal science.

[23]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[24]  John D. Storey A direct approach to false discovery rates , 2002 .

[25]  Tarone Re A modified Bonferroni method for discrete data. , 1990 .