A C++ Program for the Cramér-Von Mises Two-Sample Test

As larger sets of high-throughput data in genomics and proteomics become more readily available, there is a growing need for fast algorithms designed to compute exact p values of distribution-free statistical tests. We present a program for computing the exact distribution of the two-sample Cramer-von Mises test statistic under the null hypothesis that the two samples are drawn from the same continuous distribution. The program makes it possible to handle substantially larger sample sizes than earlier proposed computational tools. The C++ source code for the program is published with this paper, and an R package is under development.

[1]  Kimberly F. Johnson,et al.  Methods of microarray data analysis : papers from CAMDA , 2002 .

[2]  Mei-Ling Ting Lee,et al.  Analysis of Microarray Gene Expression Data , 2004, Springer US.

[3]  Gregory R. Grant,et al.  USING NON-PARAMETRIC METHODS IN THE CONTEXT OF MULTIPLE TESTING TO DETERMINE DIFFERENTIALLY EXPRESSED GENES , 2002 .

[4]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[5]  Mark A. van de Wiel,et al.  The split-up algorithm: a fast symbolic method for computing p-values of distribution-free statistics , 2001, Comput. Stat..

[6]  Julian J. Faraway,et al.  The Exact and Asymptotic Distributions of Cramer-von Mises Statistics , 1996 .

[7]  E. J. Burr Distribution of the Two-Sample Cramer-Von Mises Criterion for Small Equal Samples , 1963 .

[8]  M Richard Simon,et al.  Design and Analysis of DNA Microarray Investigations , 2004 .

[9]  H. Cramér On the composition of elementary errors , .

[10]  P. Sen,et al.  Theory of rank tests , 1969 .

[11]  T. Stamey,et al.  Molecular genetic profiling of Gleason grade 4/5 prostate cancers compared to benign prostatic hyperplasia. , 2001, The Journal of urology.

[12]  Lev Klebanov,et al.  Multivariate search for differentially expressed gene combinations , 2004, BMC Bioinformatics.

[13]  Ernst Wit,et al.  Statistics for microarrays , 2004 .

[14]  A. J. Zajta,et al.  A table of selected percentiles for the Cramér-von Mises-Lehmann test: Equal sample sizes , 1977 .

[15]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[16]  Hongyu Zhao,et al.  A semiparametric approach for marker gene selection based on gene expression data , 2005, Bioinform..

[17]  Harry Björkbacka,et al.  Generalized Rank Tests for Replicated Microarray Data , 2005, Statistical applications in genetics and molecular biology.

[18]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[19]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[20]  Xing Qiu,et al.  Assessing stability of gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[21]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[22]  William Eddy,et al.  Statistical applications , 2003 .