Scaling Self-Supervised Learning for Histopathology with Masked Image Modeling

Computational pathology is revolutionizing the field of pathology by integrating advanced computer vision and machine learning technologies into diagnostic workflows. Recently, self-supervised learning (SSL) has emerged as a promising solution to learn representations from histology patches, leveraging large volumes of unannotated whole slide images (WSI). In particular, Masked Image Modeling (MIM) showed remarkable results and robustness over purely contrastive learning methods. In this work, we explore the application of MIM to histology using iBOT, a self-supervised transformer-based framework. Through a wide range of downstream tasks over seven cancer indications, we provide recommendations on the pre-training of large models for histology data using MIM. First, we demonstrate that in-domain pre-training with iBOT outperforms both ImageNet pre-training and a model pre-trained with a purely contrastive learning objective, MoCo V2. Second, we show that Vision Transformers models (ViT), when scaled appropriately, have the capability to learn pan-cancer representations that benefit a large variety of downstream tasks. Finally, our iBOT ViT-Base model, pre-trained on more than 40 million histology images from 16 different cancer types, achieves state-of-the-art performance in most weakly-supervised WSI classification tasks compared to other SSL frameworks.

[1]  Sai Chowdary Gullapally,et al.  Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology , 2023, ArXiv.

[2]  Sangdoo Yun,et al.  What Do Self-Supervised Vision Transformers Learn? , 2023, ICLR.

[3]  Michael G. Rabbat,et al.  DINOv2: Learning Robust Visual Features without Supervision , 2023, Trans. Mach. Learn. Res..

[4]  Jakob Nikolas Kather,et al.  Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology , 2023, npj Precision Oncology.

[5]  Jakob Nikolas Kather,et al.  Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: A retrospective multi-centric study , 2023, Cell reports. Medicine.

[6]  Baharan Mirzasoleiman,et al.  Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least , 2023, ICML.

[7]  Donggeun Yoo,et al.  Benchmarking Self-Supervised Learning on Diverse Pathology Datasets , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Hammam M. AlGhamdi,et al.  An Aggregation of Aggregation Methods in Computational Pathology , 2022, Medical Image Anal..

[9]  Yann LeCun,et al.  RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank , 2022, ICML.

[10]  Wisdom O. Ikezogwo,et al.  Multi-modal Masked Autoencoders Learn Compositional Histopathological Representations , 2022, ArXiv.

[11]  João Pedro Mazuco Rodriguez,et al.  Artificial intelligence as a tool for diagnosis in digital pathology whole slide images: A systematic review , 2022, Journal of pathology informatics.

[12]  Junzhou Huang,et al.  Transformer-based unsupervised contrastive learning for histopathological image classification , 2022, Medical Image Anal..

[13]  Han Hu,et al.  On Data Scaling in Masked Image Modeling , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  R. G. Krishnan,et al.  Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Zhineng Chen,et al.  Self-distillation Augmented Masked Autoencoders for Histopathological Image Classification , 2022, ArXiv.

[16]  R. G. Krishnan,et al.  Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology , 2022, ArXiv.

[17]  C. Lundström,et al.  Learning Representations with Contrastive Self-Supervised Learning for Histopathology Applications , 2021, Machine Learning for Biomedical Imaging.

[18]  Li Dong,et al.  Swin Transformer V2: Scaling Up Capacity and Resolution , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Han Hu,et al.  SimMIM: a Simple Framework for Masked Image Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  M. Montalto,et al.  Digital pathology and artificial intelligence in translational medicine and clinical practice , 2021, Modern Pathology.

[22]  N. Rajpoot,et al.  Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study , 2021, The Lancet. Digital health.

[23]  A. Salomon,et al.  Deep learning identifies morphological patterns of homologous recombination deficiency in luminal breast cancers from whole slide images , 2021, bioRxiv.

[24]  H. Horlings,et al.  DeepSMILE: Contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer , 2021, Medical Image Anal..

[25]  A. Madabhushi,et al.  Quality control stress test for deep learning-based diagnostic model in digital pathology , 2021, Modern Pathology.

[26]  Li Dong,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.

[27]  Xiangyang Ji,et al.  TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classication , 2021, NeurIPS.

[28]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Sara Atito Ali Ahmed,et al.  SiT: Self-supervised vIsion Transformer , 2021, ArXiv.

[30]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[32]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[33]  Anne L. Martel,et al.  Self-supervised driven consistency training for annotation efficient histopathology image analysis , 2021, Medical Image Anal..

[34]  Pierre Courtiol,et al.  Self-Supervision Closes the Gap Between Weak and Strong Supervision in Histology , 2020, ArXiv.

[35]  Anne L. Martel,et al.  Self supervised contrastive learning for digital histopathology , 2020, Machine Learning with Applications.

[36]  K. Eliceiri,et al.  Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[38]  Yanling Liu,et al.  PAIP 2019: Liver cancer segmentation challenge , 2020, Medical Image Anal..

[39]  Pavitra Krishnaswamy,et al.  Self-Path: Self-Supervision for Classification of Pathology Images With Limited Annotations , 2020, IEEE Transactions on Medical Imaging.

[40]  Yubo Fan,et al.  Deep learning in digital pathology image analysis: a survey , 2020, Frontiers of Medicine.

[41]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[42]  Jean-Philippe Thiran,et al.  Divide-and-Rule: Self-Supervised Learning for Survival Analysis in Colorectal Cancer , 2020, MICCAI.

[43]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[44]  Jianfeng Gao,et al.  DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.

[45]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[46]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  A. Madabhushi,et al.  Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology , 2019, Nature reviews. Clinical oncology.

[48]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[49]  Thomas J. Fuchs,et al.  Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , 2019, Nature Medicine.

[50]  Jakob Nikolas Kather,et al.  Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer , 2019, Nature Medicine.

[51]  Geert J. S. Litjens,et al.  Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology , 2019, Medical Image Anal..

[52]  N. Razavian,et al.  Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning , 2018, Nature Medicine.

[53]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[54]  Cordelia Schmid,et al.  Spreading vectors for similarity search , 2018, ICLR.

[55]  Uri Shaham,et al.  DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network , 2016, BMC Medical Research Methodology.

[56]  Max Welling,et al.  Attention-based Deep Multiple Instance Learning , 2018, ICML.

[57]  Eric W. Tramel,et al.  Classification and Disease Localization in Histopathology Using Only Global Labels: A Weakly-Supervised Approach , 2018, ArXiv.

[58]  Andrew H. Beck,et al.  Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer , 2017, JAMA.

[59]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[60]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[61]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[62]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[63]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[64]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[67]  Ljubomir J. Buturovic,et al.  Cross-validation pitfalls when selecting and assessing regression and classification models , 2014, Journal of Cheminformatics.

[68]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[69]  J. S. Marron,et al.  A method for normalizing histology slides for quantitative analysis , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[70]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[72]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[73]  D Faraggi,et al.  A neural network model for survival data. , 1995, Statistics in medicine.

[74]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[75]  Cihang Xie,et al.  Image BERT Pre-training with Online Tokenizer , 2022, ICLR.

[76]  Junzhou Huang,et al.  TransPath: Transformer-Based Self-supervised Learning for Histopathological Image Classification , 2021, MICCAI.

[77]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[78]  Megan Holstein,et al.  Website , 2019, iPhone App Design for Entrepreneurs.

[79]  Sotirios A. Tsaftaris,et al.  Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 , 2015, Lecture Notes in Computer Science.