The -Version of the Cramér-von Mises Test for Two-Sample Comparisons in Microarray Data Analysis

Distribution-free statistical tests offer clear advantages in situations where the exact unadjusted -values are required as input for multiple testing procedures. Such situations prevail when testing for differential expression of genes in microarray studies. The Cramér-von Mises two-sample test, based on a certain -distance between two empirical distribution functions, is a distribution-free test that has proven itself as a good choice. A numerical algorithm is available for computing quantiles of the sampling distribution of the Cramér-von Mises test statistic in finite samples. However, the computation is very time- and space-consuming. An counterpart of the Cramér-von Mises test represents an appealing alternative. In this work, we present an efficient algorithm for computing exact quantiles of the -distance test statistic. The performance and power of the -distance test are compared with those of the Cramér-von Mises and two other classical tests, using both simulated data and a large set of microarray data on childhood leukemia. The -distance test appears to be nearly as powerful as its counterpart. The lower computational intensity of the -distance test allows computation of exact quantiles of the null distribution for larger sample sizes than is possible for the Cramér-von Mises test.

[1]  Xing Qiu,et al.  Assessing stability of gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[2]  H. Büning,et al.  Robustness and power of modified Lepage, Kolmogorov-Smirnov and Crame´r-von Mises two-sample tests , 2002 .

[3]  An L1-variant of the Cramer-von Mises test , 1996 .

[4]  Hongyu Zhao,et al.  A semiparametric approach for marker gene selection based on gene expression data , 2005, Bioinform..

[5]  Harry Björkbacka,et al.  Generalized Rank Tests for Replicated Microarray Data , 2005, Statistical applications in genetics and molecular biology.

[6]  Richard Von Mises,et al.  Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik , 1931 .

[7]  A. J. Zajta,et al.  A table of selected percentiles for the Cramér-von Mises-Lehmann test: Equal sample sizes , 1977 .

[8]  Andrei Yakovlev,et al.  A C++ Program for the Cramér-Von Mises Two-Sample Test , 2006 .

[9]  Gregory R. Grant,et al.  USING NON-PARAMETRIC METHODS IN THE CONTEXT OF MULTIPLE TESTING TO DETERMINE DIFFERENTIALLY EXPRESSED GENES , 2002 .

[10]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[11]  Deo Kumar Srivastava,et al.  Ch. 24. Goodness-of-fit tests for univariate and multivariate normal models , 2003 .

[12]  E. J. Burr Distribution of the Two-Sample Cramer-Von Mises Criterion for Small Equal Samples , 1963 .

[13]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[14]  Mark Trede,et al.  A distribution free test for the two sample problem for general alternatives , 1995 .

[15]  A. Martin-Löf On the composition of elementary errors , 1994 .

[16]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[17]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[18]  H. Riedwyl Goodness of Fit , 1967 .

[19]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[20]  Julian J. Faraway,et al.  The Exact and Asymptotic Distributions of Cramer-von Mises Statistics , 1996 .

[21]  T. Stamey,et al.  Molecular genetic profiling of Gleason grade 4/5 prostate cancers compared to benign prostatic hyperplasia. , 2001, The Journal of urology.

[22]  E. J. Burr Small-Sample Distributions of the Two-sample Cramer-Von Mises' $W^2$ and Watson's $U^2$ , 1964 .

[23]  Gutti J. Babu,et al.  Fundamentals of Modern Statistical Methods , 2002, Technometrics.

[24]  P. Sen,et al.  Theory of rank tests , 1969 .

[25]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[26]  Lev Klebanov,et al.  A permutation test motivated by microarray data analysis , 2006, Comput. Stat. Data Anal..