Iterative Linear Regresssion by Sector

Two-color DNA microarray data has proven valuable in high-throughput expression profiling. However microarray expression ratios (logbase2ratios) are subject to measurement error from multiple causes. Transcript abundance is expected to be a linear function of signal intensity (y = x) where the typical gene is non-responsive. Once linearity is confirmed, applying the model by fitting log-scale data with simple linear regression reduces the standard deviation of the logbase2ratios, after which fewer genes are selected by filtering methods. Comparing the residuals of regression to leverage measures can identify the best candidate genes. Spatial bias in logbase2ratio, defined by printing pin and detected by ANOVA, can be another source of measurement error. Independently applying the linear normalization method to the data from each pin can easily eliminate this error. Less easily addressed is the problem of cross-homology which is expected to correlate to cross-hybridization. Pair-wise comparison of genes demonstrate that genes with similar sequences are measured as having similar expression. While this bias cannot be easily eliminated, the effect of this probable cross-hybridization can be minimised in clustering by weighting methods introduced here.