Impact of localized fine tuning in the performance of segmentation and classification of lung nodules from computed tomography scans using deep learning

Background Algorithm malfunction may occur when there is a performance mismatch between the dataset with which it was developed and the dataset on which it was deployed. Methods A baseline segmentation algorithm and a baseline classification algorithm were developed using public dataset of Lung Image Database Consortium to detect benign and malignant nodules, and two additional external datasets (i.e., HB and XZ) including 542 cases and 486 cases were involved for the independent validation of these two algorithms. To explore the impact of localized fine tuning on the individual segmentation and classification process, the baseline algorithms were fine tuned with CT scans of HB and XZ datasets, respectively, and the performance of the fine tuned algorithms was tested to compare with the baseline algorithms. Results The proposed baseline algorithms of both segmentation and classification experienced a drop when directly deployed in external HB and XZ datasets. Comparing with the baseline validation results in nodule segmentation, the fine tuned segmentation algorithm obtained better performance in Dice coefficient, Intersection over Union, and Average Surface Distance in HB dataset (0.593 vs. 0.444; 0.450 vs. 0.348; 0.283 vs. 0.304) and XZ dataset (0.601 vs. 0.486; 0.482 vs. 0.378; 0.225 vs. 0.358). Similarly, comparing with the baseline validation results in benign and malignant nodule classification, the fine tuned classification algorithm had improved area under the receiver operating characteristic curve value, accuracy, and F1 score in HB dataset (0.851 vs. 0.812; 0.813 vs. 0.769; 0.852 vs. 0.822) and XZ dataset (0.724 vs. 0.668; 0.696 vs. 0.617; 0.737 vs. 0.668). Conclusions The external validation performance of localized fine tuned algorithms outperformed the baseline algorithms in both segmentation process and classification process, which showed that localized fine tuning may be an effective way to enable a baseline algorithm generalize to site-specific use.

[1]  W. Qian,et al.  Multi-scale segmentation squeeze-and-excitation UNet with conditional random field for segmenting lung tumor from CT images , 2022, Comput. Methods Programs Biomed..

[2]  Youngjun Kim,et al.  Deep Learning-Based Automatic Segmentation of Mandible and Maxilla in Multi-Center CT Images , 2022, Applied Sciences.

[3]  S. Saria,et al.  The Clinician and Dataset Shift in Artificial Intelligence. , 2021, The New England journal of medicine.

[4]  R. Chunara,et al.  Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database , 2021, medRxiv.

[5]  T. Ndung’u,et al.  Computer-aided interpretation of chest radiography reveals the spectrum of tuberculosis in rural South Africa , 2021, npj Digital Medicine.

[6]  P. Dragotti,et al.  Joint Learning of 3D Lesion Segmentation and Classification for Explainable COVID-19 Diagnosis , 2021, IEEE Transactions on Medical Imaging.

[7]  M. Look,et al.  Individual participant data meta‐analysis for external validation, recalibration, and updating of a flexible parametric prognostic model , 2021, Statistics in medicine.

[8]  Alistair E. W. Johnson,et al.  Recalibration of deep learning models for abnormality detection in smartphone-captured chest radiograph , 2021, npj Digital Medicine.

[9]  Stefan Jaeger,et al.  Deep learning-based pulmonary tuberculosis automated detection on chest radiography: large-scale independent testing. , 2021, Quantitative imaging in medicine and surgery.

[10]  M. Oudkerk,et al.  Lung cancer LDCT screening and mortality reduction — evidence, pitfalls and future perspectives , 2020, Nature Reviews Clinical Oncology.

[11]  Summer S. Han,et al.  Risk-Based lung cancer screening: A systematic review. , 2020, Lung cancer.

[12]  Erratum: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. , 2020, CA: a cancer journal for clinicians.

[13]  Guido Baroni,et al.  External validation of radiomics-based predictive models in low-dose CT screening for early lung cancer diagnosis. , 2020, Medical physics.

[14]  Hunter Blanton,et al.  Inconsistent Performance of Deep Learning Models on Mammogram Classification. , 2020, Journal of the American College of Radiology : JACR.

[15]  Yu-Dong Yao,et al.  Ensemble Learners of Multiple Deep CNNs for Pulmonary Nodules Classification Using CT Images , 2019, IEEE Access.

[16]  Chih-Cheng Hung,et al.  A cascaded dual-pathway residual network for lung nodule segmentation in CT images. , 2019, Physica medica : PM : an international journal devoted to the applications of physics to medicine and biology : official journal of the Italian Association of Biomedical Physics.

[17]  Shan Jiang,et al.  Classification of benign and malignant lung nodules from CT images based on hybrid features , 2019, Physics in medicine and biology.

[18]  Giancarlo Mauri,et al.  USE-Net: incorporating Squeeze-and-Excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets , 2019, Neurocomputing.

[19]  D. Aberle,et al.  External validation and recalibration of the Brock model to predict probability of cancer in pulmonary nodules using NLST data , 2019, Thorax.

[20]  Andrew A. Berlin,et al.  A 3D Probabilistic Deep Learning System for Detection and Diagnosis of Lung Cancer Using Low-Dose CT Scans , 2019, IEEE Transactions on Medical Imaging.

[21]  Eui Jin Hwang,et al.  Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. , 2019, Radiology.

[22]  Eui Jin Hwang,et al.  Development and Validation of a Deep Learning–based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs , 2018, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[23]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[24]  Yanning Zhang,et al.  Fusing texture, shape and deep model-learned information at decision level for automated classification of lung nodules on chest CT , 2018, Inf. Fusion.

[25]  Ashirbani Saha,et al.  Deep learning for segmentation of brain tumors: Impact of cross‐institutional training and testing , 2018, Medical physics.

[26]  Yannick Le Moullec,et al.  Automatic detection of multisize pulmonary nodules in CT images: Large‐scale validation of the false‐positive reduction step , 2018, Medical physics.

[27]  Jianhua Li,et al.  Agile convolutional neural network for pulmonary nodule classification using CT images , 2018, International Journal of Computer Assisted Radiology and Surgery.

[28]  Wenqing Sun,et al.  Automatic feature learning using multichannel ROI based on deep structured algorithms for computerized lung cancer diagnosis , 2017, Comput. Biol. Medicine.

[29]  Ameur Zohra,et al.  Segmentation and classification of melanoma and benign skin lesions , 2017 .

[30]  Gary S Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration , 2015, Annals of Internal Medicine.

[31]  E. Mohammadi,et al.  Barriers and facilitators related to the implementation of a physiological track and trigger system: A systematic review of the qualitative evidence , 2017, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[32]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[33]  E. Hoffman,et al.  Lung image database consortium: developing a resource for the medical imaging research community. , 2004, Radiology.

[34]  Ewout W Steyerberg,et al.  Validation and updating of predictive logistic regression models: a study on sample size and shrinkage , 2004, Statistics in medicine.

[35]  U. Tariq,et al.  Skin Lesion Segmentation and Classification Using Conventional and Deep Learning Based Framework , 2022, Computers, Materials & Continua.

[36]  Y Vergouwe,et al.  Updating methods improved the performance of a clinical prediction model in new patients. , 2008, Journal of clinical epidemiology.

[37]  Douglas G. Altman,et al.  Explanation and Elaboration , 2022 .