Estimating p-values in small microarray experiments

MOTIVATION Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of distinct permutations can be severely limited, and pooling the permutation-derived test statistics across all genes has been proposed. However, the null distribution of the test statistics under permutation is not the same for equally and differentially expressed genes. This can have a negative impact on both p-value estimation and the power of information borrowing statistics. RESULTS We investigate permutation based methods for estimating p-values. One of methods that uses pooling from a selected subset of the data are shown to have the correct type I error rate and to provide accurate estimates of the false discovery rate (FDR). We provide guidelines to select an appropriate subset. We also demonstrate that information borrowing statistics have substantially increased power compared to the t-test in small experiments.

[1]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[2]  John D. Storey A direct approach to false discovery rates , 2002 .

[3]  Jianqing Fan,et al.  Removing intensity effects and identifying significant genes for Affymetrix arrays in macrophage migration inhibitory factor-suppressed neuroblastoma cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Wei Pan,et al.  Gene expression A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data , 2005 .

[5]  R Fisher,et al.  Design of Experiments , 1936 .

[6]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[8]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[9]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[10]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[11]  Hao Wu,et al.  MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments , 2003 .

[12]  Wei Pan,et al.  On the Use of Permutation in and the Performance of A Class of Nonparametric Methods to Detect Differential Gene Expression , 2003, Bioinform..

[13]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[14]  John D. Storey,et al.  SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays , 2003 .

[15]  Scott L. Zeger,et al.  The Analysis of Gene Expression Data: An Overview of Methods and Software , 2003 .

[16]  T. Speed,et al.  Statistical issues in cDNA microarray data analysis. , 2003, Methods in molecular biology.

[17]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .