Correcting for signal saturation errors in the analysis of microarray data.

A variety of technical errors have arisen in data analysis when using cDNA or oligonucleotide microarrays. One of the most insidious problems is the saturation of the hybridization signal of high-abundant transcripts. This problem arises from the truncation of the laser fluorescence signal. When the hybridization signal on the microarray is very strong, this truncation can result in serious consequences that may not be readily apparent to the user. As an illustration of this problem, two subclasses of normal human tissue samples (six liver and six lung samples) were analyzed with GeneChip probe arrays to evaluate the patterns of expression for approximately 7000 human genes. Five of these data sets were found to suffer from signal truncation. This caused several tissues to be incorrectly classified using hierarchical clustering. To rectify this problem so that the gene expression data could be properly compared and clustered, we developed a "filtering" procedure that identifies a subset of genes least affected by the signal saturation. This filtering procedure can be obtained at www.hugeindex.org.