Gigapixel Whole-Slide Image Classification Using Unsupervised Image Compression And Contrastive Training

We propose a novel two-step methodology for entire whole-slide image (WSI) classification. First, all tissue patches in a WSI are mapped into vector embeddings using an encoder trained in an unsupervised fashion. The spatial arrangement of these embeddings is maintained with respect to the tissue patches, forming a stack of 2D feature maps representing the WSI. Second, a convolutional neural network is trained on these compact representations to predict weak labels associated with entire WSIs. We investigated several unsupervised schemes to train the encoder model: convolutional autoencoders (CAE), variational autoencoders (VAE), and a novel approach based on contrastive training. We validated the proposed methodology by predicting the existence of tumor metastasis at WSI-level using the Camelyon16 dataset. Our experimental results showed that the proposed methodology can be used to predict weak labels from entire WSIs. Furthermore, the novel contrastive encoder proved to be superior to the CAE and VAE approaches. Figure 1: Overview of the proposed method to predict patient outcome from entire WSIs. Left: encoder mapping tissue patches into embedding vectors. Center: feature extraction applied in a sliding window fashion to an entire WSI. Right: training of a CNN-based classifier on compact representations of WSIs.