Quantile normalization (QN) is a technique for microarray data processing and is the default normalization method in the Robust Multi-array Average (RMA) procedure, which was primarily designed for analysing gene expression data from Affymetrix arrays. Given the abundance of Affymetrix microarrays and the popularity of the RMA method, it is crucially important that the normalization procedure is applied appropriately. In this study we carried out simulation experiments and also analysed real microarray data to investigate the suitability of RMA when it is applied to dataset with different groups of biological samples. From our experiments, we showed that RMA with QN does not preserve the biological signal included in each group, but rather it would mix the signals between the groups. We also showed that the Median Polish method in the summarization step of RMA has similar mixing effect. RMA is one of the most widely used methods in microarray data processing and has been applied to a vast volume of data in biomedical research. The problematic behaviour of this method suggests that previous studies employing RMA could have been misadvised or adversely affected. Therefore we think it is crucially important that the research community recognizes the issue and starts to address it. The two core elements of the RMA method, quantile normalization and Median Polish, both have the undesirable effects of mixing biological signals between different sample groups, which can be detrimental to drawing valid biological conclusions and to any subsequent analyses. Based on the evidence presented here and that in the literature, we recommend exercising caution when using RMA as a method of processing microarray gene expression data, particularly in situations where there are likely to be unknown subgroups of samples.
[1]
X. Chen,et al.
Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies.
,
2011,
The Journal of clinical investigation.
[2]
Rafael A Irizarry,et al.
Frozen robust multiarray analysis (fRMA).
,
2010,
Biostatistics.
[3]
Terence P. Speed,et al.
A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
,
2003,
Bioinform..
[4]
Xi Chen,et al.
TNBCtype: A Subtyping Tool for Triple-Negative Breast Cancer
,
2012,
Cancer informatics.
[5]
John D. Storey,et al.
Supervised normalization of microarrays
,
2010,
Bioinform..
[6]
T. Speed,et al.
Summaries of Affymetrix GeneChip probe level data.
,
2003,
Nucleic acids research.
[7]
Rafael A Irizarry,et al.
Exploration, normalization, and summaries of high density oligonucleotide array probe level data.
,
2003,
Biostatistics.
[8]
Cheng Li,et al.
Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application
,
2001,
Genome Biology.
[9]
Paul A Clemons,et al.
The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease
,
2006,
Science.