USING NON-PARAMETRIC METHODS IN THE CONTEXT OF MULTIPLE TESTING TO DETERMINE DIFFERENTIALLY EXPRESSED GENES

Our focus is on the Golub et al. ALL/AML oligo-nucleotide array data set [Golub et al., 1999] with regard to the question of determining differentially expressed genes between pairs of sample types. We use this data set to analyze methods of determining genes which are likely to be differentially expressed between ALL T-cells and ALL B-cells. To this end, we employ non-parametric methods, in the context of multiple testing, for attaching statistical measures of confidence to genes predicted to be differentially expressed. In particular, we apply the method of using t-statistics, with p-values calculated through permutations, and with the Westfall and Young step-down approach to correct for multiple testing, developed by Dudoit et al. [Dudoit et al., 2000]. We also use PaGE [Manduchi et al., 2000], developed at PCBI, for assigning confidence to predictions by calculating false-positive rates directly from empirical “gene-independent” distributions. We compare the performance of these methods on the Golub et al. data. We exploit the large sample size to analyze the effect of the number of observations on a variety of issues relating to the prediction of differential expression, in particular to the reproducibility of results. In addition, we investigate the concept of using shifted intensities for data such as the Golub et al. data set. We also investigate the usage of “absent calls” in oligo-nucleotide array data.