Effects of spot and background defects on quantitative data from spotted microarrays

We apply simulations generating realistic spotted microarray image data to study the effects of broken spots and background defects on the numerical values obtained from such images. Our simulation uses gene, spot morphology, and background properties derived from a published microarray study to generate replicates of images. We generate simulated datasets for several cases with high and low severity of spot breaks and high and low severity of background defects. One hundred replicate images were generated for each case. Each spot on each replicate image was quantified for "gene expression," using a spot-finding technique similar to those found in commercial software. Over the 100 replicates, we computed statistics for each gene. We found that spot defects had little common effect on the numerical values, except to slightly reduce the values for high-expressing genes. Background defects had significant effects all around, including increasing mean values in low-expressing genes, increasing variance in low- and high-expressing genes, and altering the skewness of the distribution of numerical values, frequently changing negatively-skewed distributions to positively-skewed. We conclude that background defects are likely to have the greatest effect on the accuracy of microarray data and should be avoided through experimental protocols enforcing careful handling of slides.