Controlling technical variation amongst 6693 patient microarrays of the randomized MINDACT trial

Gene expression data obtained in large studies hold great promises for discovering disease signatures or subtypes through data analysis. It is also prone to technical variation, whose removal is essential to avoid spurious discoveries. Because this variation is not always known and can be confounded with biological signals, its removal is a challenging task. Here we provide a step-wise procedure and comprehensive analysis of the MINDACT microarray dataset. The MINDACT trial enrolled 6693 breast cancer patients and prospectively validated the gene expression signature MammaPrint for outcome prediction. The study also yielded a full-transcriptome microarray for each tumor. We show for the first time in such a large dataset how technical variation can be removed while retaining expected biological signals. Because of its unprecedented size, we hope the resulting adjusted dataset will be an invaluable tool to discover or test gene expression signatures and to advance our understanding of breast cancer. Laurent Jacob et al. develop a workflow and analytical pipeline to remove technical variation from the MINDACT microarray dataset. Their method preserved biological signals and the normalized datasets can be repurposed for the discovery of other biomarkers and signatures for breast cancer.

[1]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[2]  H. Iwase,et al.  [Breast cancer]. , 2006, Nihon rinsho. Japanese journal of clinical medicine.

[3]  Daniel W Lin,et al.  Influence of surgical manipulation on prostate gene expression: implications for molecular correlates of treatment effects and disease prognosis. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Ash A. Alizadeh,et al.  Individuality and variation in gene expression patterns in human blood , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Eric P. Hoffman,et al.  Sources of variability and effect of experimental approach on expression profiling data interpretation , 2002, BMC Bioinformatics.

[6]  E. Winer,et al.  De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017. , 2017, Annals of oncology : official journal of the European Society for Medical Oncology.

[7]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[8]  P. S. Pine,et al.  Characterization of the effect of sample quality on high density oligonucleotide microarray data using progressively degraded rat liver RNA , 2007, BMC biotechnology.

[9]  Trupti Joshi,et al.  Inferring gene regulatory networks from multiple microarray datasets , 2006, Bioinform..

[10]  D. Schadendorf,et al.  Independent replication of a melanoma subtype gene signature and evaluation of its prognostic value and biological correlates in a population cohort , 2015, Oncotarget.

[11]  Yudong D. He,et al.  Effects of atmospheric ozone on microarray data quality. , 2003, Analytical chemistry.

[12]  Diederik Wehkamp,et al.  Performance characteristics of the MammaPrint® breast cancer diagnostic gene signature. , 2013, Personalized medicine.

[13]  Vered Stearns,et al.  Adjuvant endocrine therapy for women with hormone receptor-positive breast cancer: american society of clinical oncology clinical practice guideline focused update. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[15]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[16]  Michael J Becich,et al.  In vitro transcription amplification and labeling methods contribute to the variability of gene expression profiling with DNA microarrays. , 2006, The Journal of molecular diagnostics : JMD.

[17]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[18]  M. Duffy,et al.  Clinical use of biomarkers in breast cancer: Updated guidelines from the European Group on Tumor Markers (EGTM). , 2017, European journal of cancer.

[19]  J Bogaerts,et al.  High concordance of protein (by IHC), gene (by FISH; HER2 only), and microarray readout (by TargetPrint) of ER, PgR, and HER2: results from the EORTC 10041/BIG 03-04 MINDACT trial. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[20]  Johann A. Gagnon-Bartsch,et al.  Statistical methods for handling unwanted variation in metabolomics data. , 2015, Analytical chemistry.

[21]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[22]  J Quackenbush,et al.  Effects of ischemia on gene expression. , 2001, The Journal of surgical research.

[23]  A. Witteveen,et al.  Converting a breast cancer microarray signature into a high-throughput diagnostic test , 2006, BMC Genomics.

[24]  Chittibabu Guda,et al.  A Meta Analysis of Pancreatic Microarray Datasets Yields New Targets as Cancer Genes and Biomarkers , 2014, PloS one.

[25]  M. J. van de Vijver,et al.  Microarray-Based Determination of Estrogen Receptor, Progesterone Receptor, and HER2 Receptor Status in Breast Cancer , 2009, Clinical Cancer Research.

[26]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[27]  John D. Storey,et al.  A genomic storm in critically injured humans , 2011, The Journal of experimental medicine.

[28]  L. V. van't Veer,et al.  Clinical application of the 70-gene profile: the MINDACT trial. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[29]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[30]  Amy M. Sitapati,et al.  Breast Cancer, Version 3.2020, NCCN Clinical Practice Guidelines in Oncology. , 2020, Journal of the National Comprehensive Cancer Network : JNCCN.

[31]  Z. Szallasi,et al.  An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients , 2010, Breast Cancer Research and Treatment.

[32]  C. Compton,et al.  The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population‐based to a more “personalized” approach to cancer staging , 2017, CA: a cancer journal for clinicians.

[33]  Kevin R Coombes,et al.  Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[34]  A. Witteveen,et al.  Equivalence of MammaPrint array types in clinical trials and diagnostics , 2016, Breast Cancer Research and Treatment.

[35]  Ian Krop,et al.  Use of Biomarkers to Guide Decisions on Adjuvant Systemic Therapy for Women With Early-Stage Invasive Breast Cancer: American Society of Clinical Oncology Clinical Practice Guideline Focused Update. , 2017, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[36]  Marc Buyse,et al.  Gene signature evaluation as a prognostic tool: challenges in the design of the MINDACT trial , 2006, Nature Clinical Practice Oncology.

[37]  P. Nelson,et al.  809: The Influence of Surgical Manipulation on Prostate Gene Expression: Implications for Molecular Correlates of Treatment Effects and Disease Prognosis , 2006 .

[38]  G. Durif,et al.  Probabilistic Count Matrix Factorization for Single Cell Expression Data Analysis , 2017, RECOMB 2018.

[39]  J. J. M. Hoeven [70-Gene signature as an aid to treatment decisions in early-stage breast cancer]. , 2017 .

[40]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[41]  T. Myers,et al.  Active mixing during hybridization improves the accuracy and reproducibility of microarray results. , 2005, BioTechniques.

[42]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[43]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[44]  John Quackenbush,et al.  Sources of variation in baseline gene expression levels from toxicogenomics study control animals across multiple laboratories , 2008, BMC Genomics.

[45]  E. Winer,et al.  De-escalating and escalating treatments for early-stage breast cancer: the St. Gallen International Expert Consensus Conference on the Primary Therapy of Early Breast Cancer 2017 , 2018, Annals of oncology : official journal of the European Society for Medical Oncology.

[46]  J. Kopchick,et al.  ALS blood expression profiling identifies new biomarkers, patient subgroups, and evidence for neutrophilia and hypoxia , 2019, Journal of Translational Medicine.

[47]  Terence P. Speed,et al.  Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed , 2012, Biostatistics.

[48]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..