Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies

Multicenter studies are needed to demonstrate the clinical potential value of radiomics as a prognostic tool. However, variability in scanner models, acquisition protocols and reconstruction settings are unavoidable and radiomic features are notoriously sensitive to these factors, which hinders pooling them in a statistical analysis. A statistical harmonization method called ComBat was developed to deal with the “batch effect” in gene expression microarray data and was used in radiomics studies to deal with the “center-effect”. Our goal was to evaluate modifications in ComBat allowing for more flexibility in choosing a reference and improving robustness of the estimation. Two modified ComBat versions were evaluated: M-ComBat allows to transform all features distributions to a chosen reference, instead of the overall mean, providing more flexibility. B-ComBat adds bootstrap and Monte Carlo for improved robustness in the estimation. BM-ComBat combines both modifications. The four versions were compared regarding their ability to harmonize features in a multicenter context in two different clinical datasets. The first contains 119 locally advanced cervical cancer patients from 3 centers, with magnetic resonance imaging and positron emission tomography imaging. In that case ComBat was applied with 3 labels corresponding to each center. The second one contains 98 locally advanced laryngeal cancer patients from 5 centers with contrast-enhanced computed tomography. In that specific case, because imaging settings were highly heterogeneous even within each of the five centers, unsupervised clustering was used to determine two labels for applying ComBat. The impact of each harmonization was evaluated through three different machine learning pipelines for the modelling step in predicting the clinical outcomes, across two performance metrics (balanced accuracy and Matthews correlation coefficient). Before harmonization, almost all radiomic features had significantly different distributions between labels. These differences were successfully removed with all ComBat versions. The predictive ability of the radiomic models was always improved with harmonization and the improved ComBat provided the best results. This was observed consistently in both datasets, through all machine learning pipelines and performance metrics. The proposed modifications allow for more flexibility and robustness in the estimation. They also slightly but consistently improve the predictive power of resulting radiomic models.

[1]  Eric J. W. Visser,et al.  FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0 , 2014, European Journal of Nuclear Medicine and Molecular Imaging.

[2]  Reinhard Guthke,et al.  Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis , 2012, BMC Medical Genomics.

[3]  Chaofeng Liang,et al.  Multiregional radiomics features from multiparametric MRI for prediction of MGMT methylation status in glioblastoma multiforme: A multicentre study , 2018, European Radiology.

[4]  Naoki Ishiguro,et al.  A MULTICENTRE STUDY , 2010 .

[5]  P. Lambin,et al.  Radiomics: the bridge between medical imaging and personalized medicine , 2017, Nature Reviews Clinical Oncology.

[6]  Nicholas Ayache,et al.  Validation of a method to compensate multicenter effects affecting CT radiomic features , 2019 .

[7]  Paul Kinahan,et al.  Radiomics: Images Are More than Pictures, They Are Data , 2015, Radiology.

[8]  Geoffrey G. Zhang,et al.  Voxel size and gray level normalization of CT radiomic features in lung cancer , 2018, Scientific Reports.

[9]  Christian Roux,et al.  A Fuzzy Locally Adaptive Bayesian Segmentation Approach for Volume Determination in PET , 2009, IEEE Transactions on Medical Imaging.

[10]  Caroline Reinhold,et al.  Comparison of Radiomics Models Built Through Machine Learning in a Multicentric Context With Independent Testing: Identical Data, Similar Algorithms, Different Methodologies , 2019, IEEE Transactions on Radiation and Plasma Medical Sciences.

[11]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[12]  Ping Liu,et al.  Radiomic analysis for pretreatment prediction of response to neoadjuvant chemotherapy in locally advanced cervical cancer: A multicentre study , 2019, EBioMedicine.

[13]  Markus Schuelke,et al.  Caveolin 1 Promotes Renal Water and Salt Reabsorption , 2018, Scientific Reports.

[14]  Philippe Lambin,et al.  Quantitative radiomics studies for tissue characterization: a review of technology and methodological procedures , 2017, The British journal of radiology.

[15]  Harald Binder,et al.  Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data , 2016, PloS one.

[16]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[17]  Dimitris Visvikis,et al.  Multicentric validation of radiomics findings: challenges and opportunities , 2019, EBioMedicine.

[18]  Dimitris Visvikis,et al.  Pretreatment 18F-FDG PET/CT Radiomics Predict Local Recurrence in Patients Treated with Stereotactic Body Radiotherapy for Early-Stage Non–Small Cell Lung Cancer: A Multicentric Study , 2019, The Journal of Nuclear Medicine.

[19]  Jiazhou Wang,et al.  OC-0160: Radiomics Features Harmonization for CT and CBCT in Rectal Cancer , 2017 .

[20]  John Quackenbush,et al.  Integrated Analysis of Multiple Microarray Datasets Identifies a Reproducible Survival Predictor in Ovarian Cancer , 2011, PloS one.

[21]  R. Steenbakkers,et al.  The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. , 2020, Radiology.

[22]  R. Jeraj,et al.  Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters , 2010, Acta oncologica.

[23]  Martin A Lodge,et al.  Feasibility of state of the art PET/CT systems performance harmonisation , 2018, European Journal of Nuclear Medicine and Molecular Imaging.

[24]  Timothy Solberg,et al.  Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers , 2018, Medical physics.

[25]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[26]  Dimitris Visvikis,et al.  Standardization of Multicentric Image Datasets with Generative Adversarial Networks , 2019 .

[27]  Ron Kikinis,et al.  3D Slicer , 2012, 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821).

[28]  Steffen Löck,et al.  Image biomarker standardisation initiative - feature definitions , 2016, ArXiv.

[29]  D. Townsend,et al.  Impact of Image Reconstruction Settings on Texture Features in 18F-FDG PET , 2015, The Journal of Nuclear Medicine.

[30]  John Crowley,et al.  Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat , 2015, BMC Bioinformatics.

[31]  Andre Dekker,et al.  Radiomics: the process and the challenges. , 2012, Magnetic resonance imaging.

[32]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[33]  Caroline Reinhold,et al.  Creating Robust Predictive Radiomic Models for Data From Independent Institutions Using Normalization , 2019, IEEE Transactions on Radiation and Plasma Medical Sciences.

[34]  Patrick Granton,et al.  Radiomics: extracting more information from medical images using advanced feature analysis. , 2012, European journal of cancer.

[35]  Dimitris Visvikis,et al.  Prediction of outcome using pretreatment 18F-FDG PET/CT and MRI radiomics in locally advanced cervical cancer treated with chemoradiotherapy , 2018, European Journal of Nuclear Medicine and Molecular Imaging.

[36]  M. Hatt,et al.  External validation of a combined PET and MRI radiomics model for prediction of recurrence in cervical cancer patients treated with chemoradiotherapy , 2018, European Journal of Nuclear Medicine and Molecular Imaging.

[37]  Simon Ameer-Beg,et al.  Biomedical Imaging: From Nano to Macro , 2008 .

[38]  M. Soussan,et al.  A Postreconstruction Harmonization Method for Multicenter Radiomic Studies in PET , 2018, The Journal of Nuclear Medicine.

[39]  Geoffrey G. Zhang,et al.  Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels , 2017, Medical physics.

[40]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[41]  Sang Min Lee,et al.  Deep Learning-based Image Conversion of CT Reconstruction Kernels Improves Radiomics Reproducibility for Pulmonary Nodules or Masses. , 2019, Radiology.

[42]  Fionn Murtagh,et al.  Methods of Hierarchical Clustering , 2011, ArXiv.

[43]  Steffen Löck,et al.  Why validation of prognostic models matters? , 2018, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[44]  Davide Chicco,et al.  Ten quick tips for machine learning in computational biology , 2017, BioData Mining.

[45]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  P. Lambin,et al.  Stability of radiomics features in apparent diffusion coefficient maps from a multi-centre test-retest trial , 2019, Scientific Reports.

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.