Predicting colorectal cancer tumor mutational burden from histopathological images and clinical information using multi-modal deep learning

MOTIVATION Tumor mutational burden (TMB) is an indicator of the efficacy and prognosis of immune checkpoint therapy in colorectal cancer (CRC). Cancer patients with high TMB (TMB_H) values tend to benefit from immunotherapy, whereas those with low TMB (TMB_L) values tend to be not. Though whole-exome sequencing (WES) is considered the gold standard for determining TMB, it is difficult to be applied in clinical practice due to its high cost. There are also a few DNA panel-based methods to estimate TMB; however, their detection cost is also high, and the associated wet-lab experiments usually take days, which emphasize the need for faster and cheaper alternatives. METHODS In this study, we propose a multi-modal deep learning model based on a residual network (ResNet) and multi-modal compact bilinear pooling to predict TMB status (i.e., TMB_H or TMB_L) directly from histopathological images and clinical data. We applied the model to CRC data from The Cancer Genome Atlas and compared it with four other popular methods, namely, ResNet18, ResNet50, VGG19, and AlexNet. We tested different TMB thresholds, namely, percentiles of 10%, 14.3%, 15%, 16.3%, 20%, 30% and 50%, to differentiate TMB_H and TMB_L. RESULTS For the percentile 14.3% (i.e., TMB value 20) and ResNet18, our model achieved an area under the receiver operating characteristic curve of 0.817 after five-fold cross-validation, which was better than that of other compared models. In addition, we also found that TMB values were significantly associated with tumor stage and N and M stages. Our study shows that deep learning models can predict TMB status from histopathological images and clinical information only, which is worth clinical application.