Two-color DNA microarray data has proven valuable in high-throughput expression profiling. However microarray expression ratios (log2ratios) are subject to measurement error from multiple causes. Transcript abundance is expected to be a linear function of signal intensity (y = x) where the typical gene is nonresponsive. Once linearity is confirmed, applying the model by fitting log-scale data with simple linear regression reduces the standard deviation of the log2ratios. After which fewer genes are selected by filtering methods. Comparing the residuals of regression to leverage measures can identify the best candidate genes. Spatial bias in log2ratio, defined by printing pin and detected by ANOVA, can be another source of measurement error. Independently applying the linear normalization method to the data from each pin can easily eliminate this error. Less easily addressed is the problem of cross-homology which is expected to correlate to cross-hybridization. Pair-wise comparison of genes demonstrate that genes with similar sequences are measured as having similar expression. While this bias cannot be easily eliminated, the effect this probable cross-hybridization can be minimized in clustering by weighting methods introduced here. Introduction to the Iterative Method Empirical observations, validated by statistical tests, indicate that distinct classes of measurement error alter cDNA microarray data. When these measurement errors are detectable and conform to defined models, corrections can be applied during renormalization. However, supporting biological evidence may be required to validate any normalization method. For the Spellman and Sherlock cell cycle data [3], the spatial and signal intensity dependent measurement errors were corrected through renormalization. Re-analysis of the Spellman and Sherlock cell cycle data set begins with a new method of normalization that more accurately reduces the effects of outliers and spatial variation on the arrays. First, all background-corrected signal intensity values are log transformed. Then linear regression is performed where one the signal intensities of channel is predicted by the signals from the other channel. Spatial error is corrected by performing this regression independently for each sector. Slotted printing pins produced these sectors. The microarrays used in the Spellman experiments had four sectors printed with four distinct pins.
[1]
W. Kruskal,et al.
Use of Ranks in One-Criterion Variance Analysis
,
1952
.
[2]
V. Barnett,et al.
Applied Linear Statistical Models
,
1975
.
[3]
Michael Ruogu Zhang,et al.
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.
,
1998,
Molecular biology of the cell.
[4]
R. Tibshirani,et al.
Significance analysis of microarrays applied to the ionizing radiation response
,
2001,
Proceedings of the National Academy of Sciences of the United States of America.
[5]
Terence P. Speed,et al.
Normalization for cDNA microarry data
,
2001,
SPIE BiOS.
[6]
Jin Hyun Park,et al.
Normalization for cDNA Microarray Data on the oral cancer
,
2002
.