Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

In the field of computational pathology, the use of decision support systems powered by state-of-the-art deep learning solutions has been hampered by the lack of large labeled datasets. Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset. Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches. We test our framework on a complex task, that of prostate cancer diagnosis on needle biopsies. We performed a thorough evaluation of the performance of our MIL pipeline under several conditions achieving an AUC of 0.98 on a held-out test set of 1,824 slides. These results open the way for training accurate diagnosis prediction models at scale, laying the foundation for decision support system deployment in the clinic.

[1]  D. Gleason,et al.  Histologic grading of prostate cancer: a perspective. , 1992, Human pathology.

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Mahadev Satyanarayanan,et al.  OpenSlide: A vendor-neutral software foundation for digital pathology , 2013, Journal of pathology informatics.

[4]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[5]  Thomas Hofmann,et al.  Multiple instance learning with generalized support vector machines , 2002, AAAI/IAAI.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[8]  Joachim M. Buhmann,et al.  Computational Pathology: Challenges and Promises for Tissue Analysis , 2015, Comput. Medical Imaging Graph..

[9]  Joachim M. Buhmann,et al.  Computational Pathology Analysis of Tissue Microarrays Predicts Survival of Renal Clear Cell Carcinoma Patients , 2008, MICCAI.

[10]  H Svanholm,et al.  Prostatic carcinoma reproducibility of histologic grading. , 1985, Acta pathologica, microbiologica, et immunologica Scandinavica. Section A, Pathology.

[11]  Nakul Verma Learning from data with low intrinsic dimension , 2012 .

[12]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[13]  Aleksey Boyko,et al.  Detecting Cancer Metastases on Gigapixel Pathology Images , 2017, ArXiv.