Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1,448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1–48%). We implemented different methods to mitigate batch effects (R package batchtma), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA.

[1]  G. Parmigiani,et al.  Multiplex Immunofluorescence in Formalin-Fixed Paraffin-Embedded Tumor Tissue to Identify Single-Cell–Level PI3K Pathway Activation , 2020, Clinical Cancer Research.

[2]  E. Fitzsimons,et al.  Determinants of the population health distribution: an illustration examining body mass index , 2020, International journal of epidemiology.

[3]  Timothy L Lash,et al.  Reflection on modern methods: five myths about measurement error in epidemiological research , 2019, International journal of epidemiology.

[4]  E. Platz,et al.  Adding the Team into T1 Translational Research: A Case Study of Multidisciplinary Team Science in the Evaluation of Biomarkers of Prostate Cancer Risk and Prognosis. , 2019, Clinical chemistry.

[5]  Jennifer R. Rider,et al.  Expression of IGF/insulin receptor in prostate cancer tissue and progression to lethal disease , 2018, Carcinogenesis.

[6]  Yuqing Zhang,et al.  Alternative empirical Bayes models for adjusting for batch effects in genomic studies , 2018, BMC Bioinformatics.

[7]  T. VanderWeele,et al.  Utility of inverse probability weighting in molecular pathological epidemiology , 2018, European Journal of Epidemiology.

[8]  M. Loda,et al.  MYC Overexpression at the Protein and mRNA Level and Cancer Outcomes among Men Treated with Radical Prostatectomy for Prostate Cancer , 2017, Cancer Epidemiology, Biomarkers & Prevention.

[9]  Martin A. Stoffel,et al.  rptR: repeatability estimation and variance decomposition by generalized linear mixed‐effects models , 2017 .

[10]  Benjamin Haibe-Kains,et al.  BatchQC: interactive software for evaluating sample and batch effects in genomic data , 2016, Bioinform..

[11]  Christopher H Jackson,et al.  flexsurv: A Platform for Parametric Survival Modeling in R. , 2016, Journal of statistical software.

[12]  M. Loda,et al.  Calcium-Sensing Receptor Tumor Expression and Lethal Prostate Cancer Progression. , 2016, The Journal of clinical endocrinology and metabolism.

[13]  E. Hovig,et al.  Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses , 2015, Biostatistics.

[14]  Jennifer R. Rider,et al.  Tumor expression of adiponectin receptor 2 and lethal prostate cancer. , 2015, Carcinogenesis.

[15]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[16]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[17]  Todd H. Stokes,et al.  Removing Batch Effects From Histopathological Images for Enhanced Cancer Diagnosis , 2014, IEEE Journal of Biomedical and Health Informatics.

[18]  Jennifer M. Polinski,et al.  Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases , 2014, Comput. Stat. Data Anal..

[19]  Jennifer R. Rider,et al.  SPINK1 Protein Expression and Prostate Cancer Progression , 2014, Clinical Cancer Research.

[20]  M. Loda,et al.  Prostate-Specific Membrane Antigen Protein Expression in Tumor Tissue and Risk of Lethal Prostate Cancer , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[21]  Michael J Crowther,et al.  Simulating biologically plausible complex survival data , 2013, Statistics in medicine.

[22]  M. Loda,et al.  Protein Expression of PTEN, Insulin-Like Growth Factor I Receptor (IGF-IR), and Lethal Prostate Cancer: A Prospective Study , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[23]  Jennifer R. Rider,et al.  The TMPRSS2:ERG Rearrangement, ERG Expression, and Prostate Cancer Outcomes: A Cohort Study and Meta-analysis , 2012, Cancer Epidemiology, Biomarkers & Prevention.

[24]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[25]  M. Loda,et al.  Vitamin D receptor protein expression in tumor tissue and prostate cancer progression. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[26]  Gerald C. Chu,et al.  SMAD4-dependent barrier constrains prostate cancer growth and metastatic progression , 2011, Nature.

[27]  Shinichi Nakagawa,et al.  Repeatability for Gaussian and non‐Gaussian data: a practical guide for biologists , 2010, Biological reviews of the Cambridge Philosophical Society.

[28]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[29]  M. Loda,et al.  Fatty acid synthase polymorphisms, tumor expression, body mass index, prostate cancer risk, and survival. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  Jing Ma,et al.  Gleason score and lethal prostate cancer: does 3 + 4 = 4 + 3? , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[31]  M. Rubin,et al.  Aberrant Cytoplasmic Expression of p63 and Prostate Cancer Mortality , 2009, Cancer Epidemiology Biomarkers & Prevention.

[32]  M. Loda,et al.  Overexpression of fatty acid synthase is associated with palmitoylation of Wnt1 and cytoplasmic stabilization of β-catenin in prostate cancer , 2008, Laboratory Investigation.

[33]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[34]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[35]  B Rosner,et al.  Determination of blood pressure percentiles in normal-weight children: some methodological issues. , 2008, American journal of epidemiology.

[36]  S. Hankinson,et al.  Use of biomarkers in epidemiologic studies: minimizing the influence of measurement error in the study design and analysis , 2006, Cancer Causes & Control.

[37]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[38]  O. Kallioniemi,et al.  Tissue microarray technology for high-throughput molecular profiling of cancer. , 2001, Human molecular genetics.

[39]  J. Kononen,et al.  Tissue microarrays for high-throughput molecular profiling of tumor specimens , 1998, Nature Medicine.

[40]  R. Koenker,et al.  The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators , 1997 .

[41]  J. Manson,et al.  Reproducibility of plasma hormone levels in postmenopausal women over a 2-3-year period. , 1995, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[42]  Jennifer R. Rider,et al.  The TMPRSS 2 : ERG Rearrangement , ERG Expression , and Prostate Cancer Outcomes : A Cohort Study and Meta-analysis , 2012 .

[43]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[44]  G. Andriole Radical Prostatectomy for Prostate Cancer , 1994 .