Estimating the number of true null hypotheses from a histogram of p values

In an earlier article, an intuitively appealing method for estimating the number of true null hypotheses in a multiple test situation was proposed. That article presented an iterative algorithm that relies on a histogram of observed p values to obtain the estimator. We characterize the limit of that iterative algorithm and show that the estimator can be computed directly without iteration. We compare the performance of the histogram-based estimator with other procedures for estimating the number of true null hypotheses from a collection of observed p values and find that the histogram-based estimator performs well in settings similar to those encountered in microarray data analysis. We demonstrate the approach using p values from a large microarray experiment aimed at uncovering molecular mechanisms of barley resistance to a fungal pathogen.

[1]  R. Fernando,et al.  Controlling the Proportion of False Positives in Multiple Dependent Tests , 2004, Genetics.

[2]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[3]  M. Soller,et al.  A whole genome scan for quantitative trait loci affecting milk protein percentage in Israeli-Holstein cattle, by means of selective milk DNA pooling in a daughter design, using an adjusted false discovery rate criterion. , 2001, Genetics.

[4]  E E Schadt,et al.  A new paradigm for drug discovery: integrating clinical, genetic, genomic and molecular phenotype data to identify drug targets. , 2003, Biochemical Society transactions.

[5]  Y. Benjamini,et al.  More powerful procedures for multiple significance testing. , 1990, Statistics in medicine.

[6]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  Gary A. Churchill,et al.  The Genetics of Gene Expression , 2006, Mammalian Genome.

[9]  E. Petretto,et al.  Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease , 2005, Nature Genetics.

[10]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[11]  J. Nap,et al.  Genetical genomics: the added value from segregation. , 2001, Trends in genetics : TIG.

[12]  D. Nettleton,et al.  Interaction-Dependent Gene Expression in Mla-Specified Response to Barley Powdery Mildeww⃞ , 2004, The Plant Cell Online.

[13]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[14]  Robert W. Williams,et al.  Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function , 2005, Nature Genetics.

[15]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[16]  Dan Nettleton,et al.  Genetic Regulation of Gene Expression During Shoot Development in Arabidopsis , 2006, Genetics.

[17]  Andrew I Su,et al.  Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' , 2005, Nature Genetics.

[18]  Danh V. Nguyen,et al.  On estimating the proportion of true null hypotheses for false discovery rate controlling procedures in exploratory DNA microarray studies , 2004, Comput. Stat. Data Anal..

[19]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[20]  Huey-miin Hsueh,et al.  Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing , 2003, Journal of biopharmaceutical statistics.

[21]  John D. Storey A direct approach to false discovery rates , 2002 .

[22]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[23]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[24]  Rod A Wing,et al.  A New Resource for Cereal Genomics: 22K Barley GeneChip Comes of Age1 , 2004, Plant Physiology.

[25]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[26]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[27]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[29]  D. Pomp,et al.  Quantitative genomics: exploring the genetic architecture of complex trait predisposition. , 2004, Journal of animal science.

[30]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .