DeepMCAT: Large-Scale Deep Clustering for Medical Image Categorization

In recent years, the research landscape of machine learning in medical imaging has changed drastically from supervised to semi-, weaklyor unsupervised methods. This is mainly due to the fact that ground-truth labels are time-consuming and expensive to obtain manually. Generating labels from patient metadata might be feasible but it suffers from user-originated errors which introduce biases. In this work, we propose an unsupervised approach for automatically clustering and categorizing large-scale medical image datasets, with a focus on cardiac MR images, and without using any labels. We investigated the end-to-end training using both class-balanced and imbalanced large-scale datasets. Our method was able to create clusters with high purity and achieved over 0.99 cluster purity on these datasets. The results demonstrate the potential of the proposed method for categorizing unstructured large medical databases, such as organizing clinical PACS systems in hospitals.

[1]  Georg Langs,et al.  Unsupervised deep clustering for predictive texture pattern discovery in medical images , 2020, ArXiv.

[2]  Daniel Rueckert,et al.  Self-Supervised Learning for Cardiac MR Image Segmentation by Anatomical Position Prediction , 2019, MICCAI.

[3]  Gustavo Carneiro,et al.  Unsupervised Task Design to Meta-Train Medical Image Classifiers , 2019, 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).

[4]  Michael Kohnen,et al.  Quality of DICOM header information for image categorization , 2002, SPIE Medical Imaging.

[5]  Ben Glocker,et al.  Automated cardiovascular magnetic resonance image analysis with fully convolutional networks , 2017, Journal of Cardiovascular Magnetic Resonance.

[6]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Haim Levkowitz,et al.  Introduction to information retrieval (IR) , 2008 .

[9]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[10]  Ross B. Girshick,et al.  Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ronald M. Summers,et al.  Unsupervised Joint Mining of Deep Features and Image Labels for Large-Scale Radiology Image Categorization and Scene Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[13]  Holger Roth,et al.  Unsupervised segmentation of 3D medical images based on clustering and deep representation learning , 2018, Medical Imaging.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Euijoon Ahn,et al.  Unsupervised Feature Learning with K-means and An Ensemble of Deep Convolutional Neural Networks for Medical Image Classification , 2019, ArXiv.

[16]  Julien Mairal,et al.  Unsupervised Pre-Training of Image Features on Non-Curated Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[18]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[20]  Y-net: Biomedical Image Segmentation and Clustering , 2020, ArXiv.

[21]  P. Matthews,et al.  UK Biobank’s cardiovascular magnetic resonance protocol , 2015, Journal of Cardiovascular Magnetic Resonance.