STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training

Large-scale models pre-trained on large-scale datasets have profoundly advanced the development of deep learning. However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions. Further scaling them up to higher orders of magnitude is rarely explored. An overarching goal of exploring large-scale models is to train them on large-scale medical segmentation datasets for better transfer capacities. In this work, we design a series of Scalable and Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14 million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image segmentation model to date. Our STU-Net is based on nnU-Net framework due to its popularity and impressive performance. We first refine the default convolutional blocks in nnU-Net to make them scalable. Then, we empirically evaluate different scaling combinations of network depth and width, discovering that it is optimal to scale model depth and width together. We train our scalable STU-Net models on a large-scale TotalSegmentator dataset and find that increasing model size brings a stronger performance gain. This observation reveals that a large model is promising in medical image segmentation. Furthermore, we evaluate the transferability of our model on 14 downstream datasets for direct inference and 3 datasets for further fine-tuning, covering various modalities and segmentation targets. We observe good performance of our pre-trained model in both direct inference and fine-tuning. The code and pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net.

[1]  Sjoerd van Steenkiste,et al.  Scaling Vision Transformers to 22 Billion Parameters , 2023, ICML.

[2]  Hao Wang,et al.  Exploring Vanilla U-Net for Lesion Segmentation from Whole-body FDG-PET/CT Scans , 2022, ArXiv.

[3]  B. Schölkopf,et al.  A whole-body FDG-PET/CT Dataset with manually annotated Tumor Lesions , 2022, Scientific Data.

[4]  D. Štern,et al.  Fast and Low-GPU-memory abdomen CT organ segmentation: The FLARE challenge , 2022, Medical Image Anal..

[5]  Shan Yang,et al.  TotalSegmentator: robust segmentation of 104 anatomical structures in CT images , 2022, ArXiv.

[6]  Lixu Gu,et al.  Integrated Treatment Planning in Percutaneous Microwave Ablation of Lung Tumors , 2022, 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).

[7]  Ping Luo,et al.  AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation , 2022, NeurIPS.

[8]  Shuyang Zhang,et al.  CAR-Net: A Deep Learning-Based Deformation Model for 3D/2D Coronary Artery Registration , 2022, IEEE Transactions on Medical Imaging.

[9]  Dimitris N. Metaxas,et al.  WORD: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from CT image , 2021, Medical Image Anal..

[10]  Xiahai Zhuang,et al.  Medical Image Analysis on Left Atrial LGE MRI for Atrial Fibrillation Studies: A Review , 2021, Medical Image Anal..

[11]  Bjoern H Menze,et al.  The Medical Segmentation Decathlon , 2021, Nature Communications.

[12]  Alexander Kolesnikov,et al.  Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Daguang Xu,et al.  UNETR: Transformers for 3D Medical Image Segmentation , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[14]  Congcong Wang,et al.  AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem? , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Lixu Gu,et al.  AdwU-Net: Adaptive Depth and Width U-Net for Medical Image Segmentation by Differentiable Neural Architecture Search , 2022, MIDL.

[16]  Holger Roth,et al.  Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images , 2022, BrainLes@MICCAI.

[17]  Ivo M. Baltruschat,et al.  Scaling the U-net: segmentation of biodegradable bone implants in high-resolution synchrotron radiation microtomograms , 2021, Scientific Reports.

[18]  Yizhou Yu,et al.  nnFormer: Interleaved Transformer for Volumetric Segmentation , 2021, ArXiv.

[19]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[20]  Christos Davatzikos,et al.  The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification , 2021, ArXiv.

[21]  Youyong Kong,et al.  Meta grayscale adaptive network for 3D integrated renal structures segmentation , 2021, Medical Image Anal..

[22]  Thomas Baum,et al.  A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data , 2021, Scientific Data.

[23]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[24]  Chunhua Shen,et al.  DoDNet: Learning to Segment Multi-Organ and Tumors from Multiple Partially Labeled Datasets , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[26]  Yaozong Gao,et al.  The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge , 2019, Medical Image Anal..

[27]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Jens Petersen,et al.  nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation , 2020, Nature Methods.

[29]  Daniel L. Rubin,et al.  CT-ORG, a new dataset for multiple organ segmentation in computed tomography , 2020, Scientific Data.

[30]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[31]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[32]  Jianming Liang,et al.  UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation , 2019, IEEE Transactions on Medical Imaging.

[33]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[34]  Mikhail Belyaev,et al.  Segthor: Segmentation of Thoracic Organs at Risk in CT Images , 2019, SegTHOR@ISBI.

[35]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[36]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Christopher Joseph Pal,et al.  The Importance of Skip Connections in Biomedical Image Segmentation , 2016, LABELS/DLMIA@MICCAI.

[39]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[40]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.