Normalization of oligonucleotide arrays based on the least-variant set of genes

BackgroundIt is well known that the normalization step of microarray data makes a difference in the downstream analysis. All normalization methods rely on certain assumptions, so differences in results can be traced to different sensitivities to violation of the assumptions. Illustrating the lack of robustness, in a striking spike-in experiment all existing normalization methods fail because of an imbalance between up- and down-regulated genes. This means it is still important to develop a normalization method that is robust against violation of the standard assumptionsResultsWe develop a new algorithm based on identification of the least-variant set (LVS) of genes across the arrays. The array-to-array variation is evaluated in the robust linear model fit of pre-normalized probe-level data. The genes are then used as a reference set for a non-linear normalization. The method is applicable to any existing expression summaries, such as MAS5 or RMA.ConclusionWe show that LVS normalization outperforms other normalization methods when the standard assumptions are not satisfied. In the complex spike-in study, LVS performs similarly to the ideal (in practice unknown) housekeeping-gene normalization. An R package called lvs is available in http://www.meb.ki.se/~yudpaw.

[1]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[2]  Thomas D. Schmittgen,et al.  Effect of experimental treatment on housekeeping gene expression: validation by real-time, quantitative RT-PCR. , 2000, Journal of biochemical and biophysical methods.

[3]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Tommi S. Jaakkola,et al.  Maximum-likelihood estimation of optimal scaling factors for expression array normalization , 2001, SPIE BiOS.

[5]  M. Bittner,et al.  Data management and analysis for gene expression arrays , 1998, Nature Genetics.

[6]  R. Grobholz,et al.  Standardization strategy for quantitative PCR in human seminoma and normal testis. , 2005, Journal of biotechnology.

[7]  S A Bustin,et al.  Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. , 2002, Journal of molecular endocrinology.

[8]  E. Walters,et al.  β-Actin and GAPDH housekeeping gene expression in asthmatic airways is variable and not suitable for normalising mRNA levels , 2002, Thorax.

[9]  Mei Han,et al.  Gene expression profiling of Duchenne muscular dystrophy skeletal muscle , 2003, Neurogenetics.

[10]  Horace J Spencer,et al.  Effect of Normalization on Significance Testing for Oligonucleotide Microarrays , 2004, Journal of biopharmaceutical statistics.

[11]  S. Stürzenbaum,et al.  Control genes in quantitative molecular biological techniques: the variability of invariance. , 2001, Comparative biochemistry and physiology. Part B, Biochemistry & molecular biology.

[12]  Monnie McGee,et al.  New Spiked-In Probe Sets for the Affymetrix HGU-133A Latin Square Experiment , 2006 .

[13]  Alicia Oshlack,et al.  Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes , 2007, Genome Biology.

[14]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[15]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[16]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[17]  Yudi Pawitan,et al.  Multidimensional local false discovery rate for microarray studies , 2006, Bioinform..

[18]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[19]  Carl Johan Sundberg,et al.  Modulation of extracellular matrix genes reflects the magnitude of physiological adaptation to aerobic exercise training in humans , 2005, BMC Biology.

[20]  Terence P. Speed,et al.  A benchmark for Affymetrix GeneChip expression measures , 2004, Bioinform..

[21]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[22]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[23]  M. Dugas,et al.  Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis , 2002, Genome Biology.

[24]  L. Kunkel,et al.  Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Rafael A. Irizarry,et al.  Comparison of Affymetrix GeneChip expression measures , 2006, Bioinform..

[26]  S. Knudsen,et al.  A new non-linear normalization method for reducing variability in DNA microarray experiments , 2002, Genome Biology.

[27]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[28]  R. Koenker,et al.  Regression Quantiles , 2007 .

[29]  C. Li,et al.  Feature extraction and normalization algorithms for high‐density oligonucleotide gene expression array data , 2001, Journal of cellular biochemistry. Supplement.

[30]  J. D. Porter,et al.  A chronic inflammatory response dominates the skeletal muscle molecular signature in dystrophin-deficient mdx mice. , 2002, Human molecular genetics.

[31]  Lance D. Miller,et al.  Correlation test to assess low-level processing of high-density oligonucleotide microarray data , 2005, BMC Bioinformatics.