MOTIVATION
The simplest level of statistical analysis of cancer associated gene expression matrices is aimed at finding consistently up- or down-regulated genes within a given set of tumor samples. Considering the high level of gene expression diversity detected in cancer, one needs to assess the probability that the consistent mis-regulation of a given gene is due to chance. Furthermore, it is important to determine the required sample number that will ensure the meaningful statistical analysis of massively parallel gene expression measurements.
RESULTS
The probability of consistent mis-regulation is calculated in this paper for binarized gene expression data, using combinatorial considerations. For practical purposes, we also provide a set of accurate approximate formulas for determining the same probability in a computationally less intensive way. When the pool of mis-regulatable genes is restricted, the probability of consistent mis-regulation can be overestimated. We show, however, that this effect has little practical consequences for cancer associated gene expression measurements published in the literature. Finally, in order to aid experimental design, we have provided estimates on the required sample number that will ensure that the detected consistent mis-regulation is not due to chance. Our results suggest that less than 20 sufficiently diverse tumor samples may be enough to identify consistently mis-regulated genes in a statistically significant manner.
AVAILABILITY
An implementation using Mathematica (tm) of the main equation of the paper, (4), is available at www.me.chalmers.se/~mwahde/bioinfo.html.
[1]
Zoltan Szallasi,et al.
Mutual Information Analysis as a Tool to Assess the Role of Aneuploidy in the Generation of Cancer-Associated Differential Gene Expression Patterns
,
2001,
Pacific Symposium on Biocomputing.
[2]
Gregory R. Grant,et al.
Generation of patterns from gene expression data by assigning confidence to differentially expressed genes
,
2000,
Bioinform..
[3]
N. Sampas,et al.
Molecular classification of cutaneous malignant melanoma by gene expression profiling
,
2000,
Nature.
[4]
Christian A. Rees,et al.
Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.
,
1999,
Proceedings of the National Academy of Sciences of the United States of America.
[5]
Y. Chen,et al.
Ratio-based decisions and the quantitative analysis of cDNA microarray images.
,
1997,
Journal of biomedical optics.
[6]
Ash A. Alizadeh,et al.
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
,
2000,
Nature.
[7]
J. Claverie.
Computational methods for the identification of differential and coordinated gene expression.
,
1999,
Human molecular genetics.
[8]
I S Kohane,et al.
Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements.
,
1999,
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.