A compression algorithm for pre-simulated Monte Carlo p-value functions: Application to the ontological analysis of microarray studies

Monte Carlo simulation is frequently employed to compute p-values for test statistics with unknown null distributions. However, the computations can be exceedingly time-consuming, and, in such cases, the use of pre-computed simulations can be considered to increase speed. This approach is attractive in principle, but complicated in practice because the size of the pre-computed data can be prohibitively large. We developed an algorithm for computing size-reduced representations of Monte Carlo p-value functions. We show that, in typical settings, this algorithm reduces the size of the pre-computed data by several orders of magnitude, while bounding provably the approximation error at an explicitly controllable level. The algorithm is data-independent, fully non-parametric, and easy to implement. We exemplify its practical utility by applying it to the threshold-free ontological analysis of microarray data. The presented algorithm simplifies the use of pre-computed Monte Carlo p-value functions in software, including specialized bioinformatics applications.

[1]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[2]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[3]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[4]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Björn Nilsson,et al.  Threshold-free high-power methods for the ontological analysis of genome-wide gene-expression studies , 2007, Genome Biology.

[6]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[7]  N. Metropolis,et al.  The Monte Carlo method. , 1949 .

[8]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Hagai Bergman,et al.  Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression , 2005, Bioinform..

[12]  T. Golub,et al.  A Mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer , 2003, Cell.