论文信息 - Kernel estimation for adjusted p

Kernel estimation for adjusted p

Multiple testing procedures are frequently applied to biomedical and genomic research, for instance, identification of differentially expressed genes in microarray experiments. Resampling methods are commonly used to compute adjusted p-values in multiple hypothesis testing problems. Importantly, the resampling-based multiple testing procedures are sensitive to the number of permutations, especially for the MinP adjustment procedure. The single-step MinP adjusted p-values are derived from the distribution of the minimum of the p-values. Because of computational complexity, the adjusted p-values are often computed using the distribution of the maximum of the test statistics (MaxT). This paper proposes an approach based on the kernel density estimation (KDE) technique to reduce the number of permutations for implementing the single-step MinP adjustment. Simulation studies are conducted to demonstrate that the KDE method is more powerful than the MinP adjustment method under independent and correlated models. The three resampling-based single-step adjustment procedures, MaxT, MinP, and KDE, are applied to two published microarray data sets, the colon tumor data set consisting of 40 tumor and 22 normal colon tissue samples on 2000 human genes (endpoints) and the leukemia data set consisting of 27 acute lymphoblastic leukemia and 11 acute myeloid leukemia samples on 3051 genes. The MaxT adjusted p-values are very robust to the number of permutations. The MaxT adjusted p-values are stable with 10,000 permutations, while the MinP adjusted p-values are step functions. As the number of permutations increases, the number of ties decrease. The adjusted p-values are stable with 500,000 permutations. For the KDE method, the adjusted p-values are stable at 50,000 permutations. At 1000,000 permutations, the three procedures have similar adjusted p-values.

James J. Chen | Chen-An Tsai

[1] J. Mesirov,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2] Rudolf Beran,et al. Balanced Simultaneous Confidence Sets , 1988 .

[3] S. P. Wright,et al. Adjusted P-values for simultaneous inference , 1992 .

[4] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5] Sin-Ho Jung,et al. Sample size for FDR-control in microarray data analysis , 2005, Bioinform..

[6] S. Dudoit,et al. Resampling-based multiple testing for microarray data analysis , 2003 .

[7] A. Tamhane,et al. Multiple Comparison Procedures , 1989 .

[8] J. Booth,et al. Resampling-Based Multiple Testing. , 1994 .