Microarray-Based RNA Profiling of Breast Cancer: Batch Effect Removal Improves Cross-Platform Consistency

Microarray is a powerful technique used extensively for gene expression analysis. Different technologies are available, but lack of standardization makes it challenging to compare and integrate data. Furthermore, batch-related biases within datasets are common but often not tackled. We have analyzed the same 234 breast cancers on two different microarray platforms. One dataset contained known batch-effects associated with the fabrication procedure used. The aim was to assess the significance of correcting for systematic batch-effects when integrating data from different platforms. We here demonstrate the importance of detecting batch-effects and how tools, such as ComBat, can be used to successfully overcome such systematic variations in order to unmask essential biological signals. Batch adjustment was found to be particularly valuable in the detection of more delicate differences in gene expression. Furthermore, our results show that prober adjustment is essential for integration of gene expression data obtained from multiple sources. We show that high-variance genes are highly reproducibly expressed across platforms making them particularly well suited as biomarkers and for building gene signatures, exemplified by prediction of estrogen-receptor status and molecular subtypes. In conclusion, the study emphasizes the importance of utilizing proper batch adjustment methods when integrating data across different batches and platforms.

[1]  Gordon K. Smyth,et al.  A comparison of background correction methods for two-colour microarrays , 2007, Bioinform..

[2]  Kurt Hornik,et al.  The Strucplot Framework: Visualizing Multi-way Contingency Tables with vcd , 2006 .

[3]  Valeria Panebianco,et al.  Supplementary Figure 2 , 2012 .

[4]  C. Yauk,et al.  Review of the literature examining the correlation among DNA microarray technologies , 2007, Environmental and molecular mutagenesis.

[5]  A. Vargas,et al.  Gene expression profiling of formalin‐fixed, paraffin‐embedded familial breast tumours using the whole genome‐DASL assay , 2010, The Journal of pathology.

[6]  Mads Thomassen,et al.  Classifications within Molecular Subtypes Enables Identification of BRCA1/BRCA2 Mutation Carriers by RNA Tumor Profiling , 2013, PloS one.

[7]  Tieliu Shi,et al.  Consistency of predictive signature genes and classifiers generated using different microarray platforms , 2010, The Pharmacogenomics Journal.

[8]  Daniel J. Park,et al.  A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies , 2006, Nature Biotechnology.

[9]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[12]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[13]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[15]  Xu Yu-bo,et al.  Reachability Checking of Finite Precision Timed Automata , 2006 .

[16]  Puay Hoon Tan,et al.  Conservation of Breast Cancer Molecular Subtypes and Transcriptional Patterns of Tumor Progression Across Distinct Ethnic Populations , 2004, Clinical Cancer Research.

[17]  K. Brusgaard,et al.  Spotting and validation of a genome wide oligonucleotide chip with duplicate measurement of each gene. , 2006, Biochemical and biophysical research communications.

[18]  Yee Hwa Yang,et al.  Normalization for two-color cDNA microarray data , 2003 .

[19]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[20]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Tao Han,et al.  Improvement in the Reproducibility and Accuracy of DNA Microarray Quantification by Optimizing Hybridization Conditions , 2006, BMC Bioinformatics.

[22]  Tao Han,et al.  Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential , 2005, BMC Bioinformatics.

[23]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[24]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[25]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[26]  Benjamin Haibe-Kains,et al.  Minimising Immunohistochemical False Negative ER Classification Using a Complementary 23 Gene Expression Signature of ER Status , 2010, PloS one.

[27]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Jiasen Lu,et al.  Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. , 2000, Nucleic acids research.

[29]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[30]  Ravi Kothapalli,et al.  Microarray results: how accurate are they? , 2002, BMC Bioinformatics.

[31]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[32]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[33]  M. Larsen,et al.  RNA profiling reveals familial aggregation of molecular subtypes in non-BRCA1/2 breast cancer families , 2014, BMC Medical Genomics.

[34]  Tieliu Shi,et al.  A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data , 2010, The Pharmacogenomics Journal.