Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

We present ARCH, a computational pathology (CP) multiple instance captioning dataset to facilitate dense supervision of CP tasks. Existing CP datasets focus on narrow tasks; ARCH on the other hand contains dense diagnos-tic and morphological descriptions for a range of stains, tissue types and pathologies. Using intrinsic dimensionality estimation, we show that ARCH is the only CP dataset to (ARCH-)rival its computer vision analog MS-COCO Captions. We conjecture that an encoder pre-trained on dense image captions learns transferable representations for most CP tasks. We support the conjecture with evidence that ARCH representation transfers to a variety of pathology sub-tasks better than ImageNet features or representations obtained via self-supervised or multi-task learning on pathology images alone. We release our best model and invite other researchers to test it on their CP tasks.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[3]  P. Alam,et al.  H , 1887, High Explosives, Propellants, Pyrotechnics.

[4]  Alexander W. Jung,et al.  Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis , 2019, Nature Cancer.

[5]  T. Hermanns,et al.  Automated Gleason grading of prostate cancer tissue microarrays via deep learning , 2018, Scientific Reports.

[6]  Karl Rohr,et al.  Predicting breast tumor proliferation from whole‐slide images: The TUPAC16 challenge , 2018, Medical Image Anal..

[7]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  David B. A. Epstein,et al.  Cellular community detection for tissue phenotyping in colorectal cancer histology images , 2020, Medical Image Anal..

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[11]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jakob Nikolas Kather,et al.  Pan-cancer image-based detection of clinically actionable genetic alterations , 2019, Nature Cancer.

[13]  Nasir M. Rajpoot,et al.  PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification , 2019, ECDP.

[14]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ellery Wulczyn,et al.  Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer , 2018, npj Digital Medicine.

[17]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[18]  Ming Y. Lu,et al.  Data-efficient and weakly supervised computational pathology on whole-slide images , 2020, Nature Biomedical Engineering.

[19]  Thomas J. Fuchs,et al.  Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , 2019, Nature Medicine.

[20]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[21]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Pierre Geurts,et al.  Multi-Task Pre-Training of Deep Neural Networks for Digital Pathology , 2020, IEEE Journal of Biomedical and Health Informatics.

[23]  Helen Pitman,et al.  Artificial intelligence in digital pathology: a roadmap to routine use in clinical practice , 2019, The Journal of pathology.

[24]  Michael R. Lyu,et al.  SelFlow: Self-Supervised Learning of Optical Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.

[26]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[27]  Ellen T. Gelfand,et al.  A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project , 2015, Biopreservation and biobanking.

[28]  Abhinav Gupta,et al.  Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Lin Yang,et al.  MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[32]  N. M. Rajpoot,et al.  FABnet: feature attention-based network for simultaneous segmentation of microvessels and nerves in routine histology images of oral cancer , 2019, Neural Computing and Applications.

[33]  Allyson Ettinger,et al.  Probing for semantic evidence of composition by means of simple classification tasks , 2016, RepEval@ACL.

[34]  Alexander J. Smola,et al.  Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[35]  Jon Kleinberg,et al.  Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.

[36]  Ming Y. Lu,et al.  Deep Learning-based Computational Pathology Predicts Origins for Cancers of Unknown Primary , 2020, ArXiv.

[37]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[38]  Konstantinos N. Plataniotis,et al.  Atlas of Digital Pathology: A Generalized Hierarchical Histological Tissue Type-Annotated Database for Deep Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Nasir M. Rajpoot,et al.  A Multi-resolution Deep Learning Framework for Lung Adenocarcinoma Growth Pattern Classification , 2018, MIUA.

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[41]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[42]  Francesco Bianconi,et al.  Multi-class texture analysis in colorectal cancer histology , 2016, Scientific Reports.

[43]  B. van Ginneken,et al.  Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. , 2020, The Lancet. Oncology.

[44]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[45]  Ming Y. Lu,et al.  Semi-Supervised Histology Classification using Deep Multiple Instance Learning and Contrastive Predictive Coding , 2019, ArXiv.

[46]  José García Rodríguez,et al.  A Review on Deep Learning Techniques Applied to Semantic Segmentation , 2017, ArXiv.

[47]  Francesco Ciompi,et al.  Neural Image Compression for Gigapixel Histopathology Image Analysis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jason Yosinski,et al.  Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.

[49]  Samuel Leung,et al.  Diagnosis of Ovarian Carcinoma Cell Type is Highly Reproducible: A Transcanadian Study , 2010, The American journal of surgical pathology.

[50]  Hao Chen,et al.  MILD‐Net: Minimal information loss dilated network for gland instance segmentation in colon histology images , 2018, Medical Image Anal..

[51]  Nasir M. Rajpoot,et al.  A Novel Digital Score for Abundance of Tumour Infiltrating Lymphocytes Predicts Disease Free Survival in Oral Squamous Cell Carcinoma , 2019, Scientific Reports.

[52]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[53]  Constantino Carlos Reyes-Aldasoro,et al.  Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study , 2019, PLoS medicine.

[54]  Karan Desai,et al.  VirTex: Learning Visual Representations from Textual Annotations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Meyke Hermsen,et al.  1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset , 2018, GigaScience.

[56]  Rajarsi R. Gupta,et al.  Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. , 2018, Cell reports.

[57]  Metin Nafi Gürcan,et al.  Adaptive Discriminant Wavelet Packet Transform and Local Binary Patterns for Meningioma Subtype Classification , 2008, MICCAI.

[58]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[59]  Vijayan K. Asari,et al.  The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches , 2018, ArXiv.

[60]  Jakob Nikolas Kather,et al.  Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer , 2019, Nature Medicine.

[61]  Jeroen van der Laak,et al.  Extending Unsupervised Neural Image Compression With Supervised Multitask Learning , 2020, MIDL.

[62]  David B. A. Epstein,et al.  Glandular Morphometrics for Objective Grading of Colorectal Adenocarcinoma Histology Images , 2017, Scientific Reports.

[63]  Daniel Smilkov,et al.  Similar image search for histopathology: SMILY , 2019, npj Digital Medicine.

[64]  George Lee,et al.  Nuclear Shape and Architecture in Benign Fields Predict Biochemical Recurrence in Prostate Cancer Patients Following Radical Prostatectomy: Preliminary Findings. , 2016, European urology focus.