论文信息 - Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images

Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images

Pre-training datasets, like ImageNet, have become the gold standard in medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pre-training on non-medical images can be applied to chest radiographs and how it compares to supervised pre-training on non-medical images and on medical images. We utilized a vision transformer and initialized its weights based on (i) SSL pre-training on natural images (DINOv2), (ii) SL pre-training on natural images (ImageNet dataset), and (iii) SL pre-training on chest radiographs from the MIMIC-CXR database. We tested our approach on over 800,000 chest radiographs from six large global datasets, diagnosing more than 20 different imaging findings. Our SSL pre-training on curated images not only outperformed ImageNet-based pre-training (P<0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pre-training strategy, especially with SSL, can be pivotal for improving artificial intelligence (AI)'s diagnostic accuracy in medical imaging. By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging.

Jakob Nikolas Kather | Soroosh Tayebi Arasteh | S. Nebelung | D. Truhn | Leo Misera

[1] Jean-Baptiste Schiratti,et al. Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling , 2023, medRxiv.

[2] Soroosh Tayebi Arasteh,et al. Preserving privacy in domain transfer of medical AI models comes at no performance costs: The integral role of differential privacy , 2023, ArXiv.

[3] Michael G. Rabbat,et al. DINOv2: Learning Robust Visual Features without Supervision , 2023, ArXiv.

[4] S. Nebelung,et al. Automatic Evaluation of Chest Radiographs – The Data Source Matters, But How Much Exactly? , 2023, RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren.

[5] Jakob Nikolas Kather,et al. Using Machine Learning to Reduce the Need for Contrast Agents in Breast MRI through Synthetic Images. , 2023, Radiology.

[6] Soroosh Tayebi Arasteh,et al. Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging , 2023, 2302.01622.

[7] Jakob Nikolas Kather,et al. Artificial Intelligence for Clinical Interpretation of Bedside Chest Radiographs. , 2022, Radiology.

[8] Jakob Nikolas Kather,et al. Collaborative training of medical artificial intelligence models with non-uniform labels , 2022, Scientific reports.

[9] P. Rajpurkar,et al. Self-supervised learning in medicine and healthcare , 2022, Nature Biomedical Engineering.

[10] Leiting Chen,et al. Rethinking pre-training on medical imaging , 2021, J. Vis. Commun. Image Represent..

[11] A. Ng,et al. CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-Ray interpretation , 2021, CHIL.

[12] Binh T. Nguyen,et al. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations , 2020, Scientific Data.

[13] Fillia Makedon,et al. A Survey on Contrastive Self-supervised Learning , 2020, Technologies.

[14] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[15] Xiaohua Zhai,et al. Are we done with ImageNet? , 2020, ArXiv.

[16] Steven Horng,et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports , 2019, Scientific Data.

[17] Dawn Song,et al. Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[18] Jakob Nikolas Kather,et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer , 2019, Nature Medicine.

[19] Antonio Pertusa,et al. PadChest: A large chest x-ray image dataset with multi-label annotated reports , 2019, Medical Image Anal..

[20] Yifan Yu,et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[21] Constantino Carlos Reyes-Aldasoro,et al. Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study , 2019, PLoS medicine.

[22] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Ahmed Hosny,et al. Artificial intelligence in radiology , 2018, Nature Reviews Cancer.

[24] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[26] Ilker Ünal,et al. Defining an Optimal Cut-Point Value in ROC Analysis: An Alternative Approach , 2017, Comput. Math. Methods Medicine.

[27] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Ronald M. Summers,et al. ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[31] Frank Konietschke,et al. Bootstrapping and permuting paired t-test type statistics , 2014, Stat. Comput..

[32] Stephen Wright,et al. An alternative approach. , 2010, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[33] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[34] Jia Deng,et al. A large-scale hierarchical image database , 2009, CVPR 2009.

[35] Edinburgh Research Explorer The PASCAL Visual Object Classes (VOC) Challenge , 2022 .