Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles

BackgroundMicroarray technology is a popular means of producing whole genome transcriptional profiles, however high cost and scarcity of mRNA has led many studies to be conducted based on the analysis of single samples. We exploit the design of the Illumina platform, specifically multiple arrays on each chip, to evaluate intra-experiment technical variation using repeated hybridisations of universal human reference RNA (UHRR) and duplicate hybridisations of primary breast tumour samples from a clinical study.ResultsA clear batch-specific bias was detected in the measured expressions of both the UHRR and clinical samples. This bias was found to persist following standard microarray normalisation techniques. However, when mean-centering or empirical Bayes batch-correction methods (ComBat) were applied to the data, inter-batch variation in the UHRR and clinical samples were greatly reduced. Correlation between replicate UHRR samples improved by two orders of magnitude following batch-correction using ComBat (ranging from 0.9833-0.9991 to 0.9997-0.9999) and increased the consistency of the gene-lists from the duplicate clinical samples, from 11.6% in quantile normalised data to 66.4% in batch-corrected data. The use of UHRR as an inter-batch calibrator provided a small additional benefit when used in conjunction with ComBat, further increasing the agreement between the two gene-lists, up to 74.1%.ConclusionIn the interests of practicalities and cost, these results suggest that single samples can generate reliable data, but only after careful compensation for technical bias in the experiment. We recommend that investigators appreciate the propensity for such variation in the design stages of a microarray experiment and that the use of suitable correction methods become routine during the statistical analysis of the data.

[1]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[2]  A. Regev,et al.  An embryonic stem cell–like gene expression signature in poorly differentiated aggressive human tumors , 2008, Nature Genetics.

[3]  Debashis Ghosh,et al.  Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data , 2004, BMC Genomics.

[4]  Kevin R Coombes,et al.  Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[6]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[7]  Joachim L. Schultze,et al.  The development of a comparison approach for Illumina bead chips unravels unexpected challenges applying newest generation microarrays , 2009, BMC Bioinformatics.

[8]  P. S. Pine,et al.  Characterization of the effect of sample quality on high density oligonucleotide microarray data using progressively degraded rat liver RNA , 2007, BMC biotechnology.

[9]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[10]  R. Kitchen,et al.  Design and optimization of reverse-transcription quantitative PCR experiments. , 2009, Clinical chemistry.

[11]  Charles M Perou,et al.  Agreement in breast cancer classification between microarray and quantitative reverse transcription PCR from fresh-frozen and formalin-fixed, paraffin-embedded tissues. , 2007, Clinical chemistry.

[12]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[14]  T. Golub,et al.  DNA microarrays in clinical oncology. , 2002, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  E. Gehan,et al.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data , 2008, Nature Reviews Cancer.

[16]  A. Sims,et al.  Bioinformatics and breast cancer: what can high-throughput genomic approaches actually tell us? , 2009, Journal of Clinical Pathology.

[17]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Ransohoff,et al.  Sources of bias in specimens for research about molecular markers for cancer. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  Yudong D. He,et al.  Effects of atmospheric ozone on microarray data quality. , 2003, Analytical chemistry.

[20]  Charles E McCulloch,et al.  Empirical Bayes accomodation of batch-effects in microarray data using identical replicate reference samples: application to RNA expression profiling of blood from Duchenne muscular dystrophy patients , 2008, BMC Genomics.

[21]  I. Ellis,et al.  A consensus prognostic gene expression classifier for ER positive breast cancer , 2006, Genome Biology.

[22]  James C. Fuscoe,et al.  Use of a mixed tissue RNA design for performance assessments on multiple microarray formats , 2005, Nucleic acids research.

[23]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[24]  G. W. Snedecor Statistical Methods , 1964 .

[25]  P. S. Pine,et al.  Comparison of the diagnostic performance of human whole genome microarrays using mixed-tissue RNA reference samples. , 2009, Toxicology letters.

[26]  K. Coombes,et al.  Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology , 2009, 1010.1092.

[27]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[28]  Dechang Chen,et al.  Integrated analysis of independent gene expression microarray datasets improves the predictability of breast cancer outcome , 2007, BMC Genomics.

[29]  H. Dressman,et al.  Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer. , 2008, JAMA.

[30]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[31]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Wei Shi,et al.  Illumina WG-6 BeadChip strips should be normalized separately , 2009, BMC Bioinformatics.

[33]  E. Lander Array of hope , 1999, Nature Genetics.

[34]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[35]  Shibing Deng,et al.  Cross-site comparison of gene expression data reveals high similarity. , 2004, Environmental health perspectives.

[36]  Matthew E. Ritchie,et al.  A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data , 2009, Nucleic acids research.

[37]  Philippe Rigault,et al.  A novel, high-performance random array platform for quantitative gene expression profiling. , 2004, Genome research.

[38]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[39]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[41]  D. Ransohoff Bias as a threat to the validity of cancer molecular-marker research , 2005, Nature reviews. Cancer.