Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics

Testing of the disjunction hypothesis is appropriate when each gene or location studied is associated with multiple $p$-values, each of which is of individual interest. This can occur when more than one aspect of an underlying process is measured. For example, cancer researchers may hope to detect genes that are both differentially expressed on a transcriptomic level and show evidence of copy number aberration. Currently used methods of $p$-value combination for this setting are overly conservative, resulting in very low power for detection. In this work, we introduce a method to test the disjunction hypothesis by using cumulative areas from the Voronoi diagram of two-dimensional vectors of $p$-values. Our method offers much improved power over existing methods, even in challenging situations, while maintaining appropriate error control. We apply the approach to data from two published studies: the first aims to detect periodic genes of the organism Schizosaccharomyces pombe, and the second aims to identify genes associated with prostate cancer.

[1]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[2]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[3]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[4]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[5]  L. Chin,et al.  High-resolution genomic profiles of human lung cancer. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Debashis Ghosh Generalized Benjamini-Hochberg procedures using spacings , 2011 .

[7]  E. Suchman,et al.  The American Soldier: Adjustment During Army Life. , 1949 .

[8]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[9]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[10]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[11]  John D. Storey A direct approach to false discovery rates , 2002 .

[12]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[13]  Thomas M. Loughin,et al.  A systematic comparison of methods for combining p , 2004, Comput. Stat. Data Anal..

[14]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Adam P. Rosebrock,et al.  The Cell Cycle–Regulated Genes of Schizosaccharomyces pombe , 2005, PLoS biology.

[16]  R. Eils,et al.  Microarray-based copy number and expression profiling in dedifferentiated and pleomorphic liposarcoma. , 2002, Cancer research.

[17]  Stan Pounds,et al.  Estimation and control of multiple testing error rates for microarray studies , 2006, Briefings Bioinform..

[18]  Joseph E. Yukich,et al.  Asymptotics for Statistical Distances Based on Voronoi Tessellations , 2002 .

[19]  Art B. Owen,et al.  Karl Pearson’s meta analysis revisited , 2009, 0911.3531.

[20]  Omkar Muralidharan,et al.  An empirical Bayes mixture method for effect size and false discovery rate estimation , 2010, 1010.1425.

[21]  W. Whitmore,et al.  Fibrinolysis in metastatic cancer of the prostate , 1952, Cancer.

[22]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[23]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[24]  S. Dhanasekaran,et al.  Integrative analysis of genomic aberrations associated with prostate cancer progression. , 2007, Cancer research.

[25]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[26]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[27]  Y. Benjamini,et al.  Screening for Partial Conjunction Hypotheses , 2008, Biometrics.

[28]  Peer Bork,et al.  Comparison of computational methods for the identification of cell cycle-regulated genes , 2005, Bioinform..

[29]  V A Memoli,et al.  Pathways of Coagulation/Fibrinolysis Activation in Malignancy , 1992, Seminars in thrombosis and hemostasis.

[30]  Debashis Ghosh,et al.  Incorporating the Empirical Null Hypothesis into the Benjamini-Hochberg Procedure , 2012, Statistical applications in genetics and molecular biology.

[31]  B WILKINSON,et al.  A statistical consideration in psychological research. , 1951, Psychological bulletin.

[32]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.