Multi-task weak supervision enables anatomically-resolved abnormality detection in whole-body FDG-PET/CT

Computational decision support systems could provide clinical value in whole-body FDG-PET/CT workflows. However, limited availability of labeled data combined with the large size of PET/CT imaging exams make it challenging to apply existing supervised machine learning systems. Leveraging recent advancements in natural language processing, we describe a weak supervision framework that extracts imperfect, yet highly granular, regional abnormality labels from free-text radiology reports. Our framework automatically labels each region in a custom ontology of anatomical regions, providing a structured profile of the pathologies in each imaging exam. Using these generated labels, we then train an attention-based, multi-task CNN architecture to detect and estimate the location of abnormalities in whole-body scans. We demonstrate empirically that our multi-task representation is critical for strong performance on rare abnormalities with limited training data. The representation also contributes to more accurate mortality prediction from imaging data, suggesting the potential utility of our framework beyond abnormality detection and location estimation.

[1]  Ronald M. Summers,et al.  Holistic and Comprehensive Annotation of Clinically Significant Findings on Diverse CT Images: Learning From Radiology Reports and Label Ontology , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Euan A. Ashley,et al.  Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences , 2019, Nature Communications.

[3]  Daniel L. Rubin,et al.  Doubly Weak Supervision of Deep Learning Models for Head CT , 2019, MICCAI.

[4]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[5]  Kaiming He,et al.  Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  A. Ng,et al.  Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists , 2018, PLoS medicine.

[9]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[10]  Yong Luo,et al.  Manifold Regularized Multitask Learning for Semi-Supervised Multilabel Image Classification , 2013, IEEE Transactions on Image Processing.

[11]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[12]  Nico Karssemeijer,et al.  Large scale deep learning for computer aided detection of mammographic lesions , 2017, Medical Image Anal..

[13]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[14]  Christopher Ré,et al.  Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[15]  Qiaoliang Li,et al.  Fully Automated Delineation of Gross Tumor Volume for Head and Neck Cancer on PET-CT Using Deep Learning: A Dual-Center Study , 2018, Contrast media & molecular imaging.

[16]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  John O. Prior,et al.  Reporting Guidance for Oncologic 18F-FDG PET/CT Imaging , 2013, The Journal of Nuclear Medicine.

[19]  Richard J. Caselli,et al.  Deep-learning-based classification of FDG-PET data for Alzheimer's disease categories , 2017, Symposium on Medical Information Processing and Analysis.

[20]  Thomas Anderson,et al.  State of the Art of Natural Language Processing , 1987 .

[21]  Ulas Bagci,et al.  Semi-Supervised Multi-Task Learning for Lung Cancer Diagnosis , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[22]  B. Spottiswoode,et al.  18F-FDG PET/CT Uptake Classification in Lymphoma and Lung Cancer by Using Deep Convolutional Neural Networks. , 2019, Radiology.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[27]  Marcus A. Badgeley,et al.  Automated deep-neural-network surveillance of cranial images for acute neurologic events , 2018, Nature Medicine.

[28]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Andrew Y. Ng,et al.  Improving palliative care with deep learning , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Kristin R. Archer,et al.  An analysis from the Quality Outcomes Database, Part 2. Predictive model for return to work after elective surgery for lumbar degenerative disease. , 2017, Journal of neurosurgery. Spine.

[33]  Daniel L. Rubin,et al.  Cross-Modal Data Programming Enables Rapid Medical Machine Learning , 2019, Patterns.

[34]  Daniel L. Rubin,et al.  Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives , 2018, Scientific Reports.

[35]  Gustavo Carneiro,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[36]  Christopher Ré,et al.  The Role of Massively Multi-Task and Weak Supervision in Software 2.0 , 2019, CIDR.

[37]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[38]  F. d'Amore,et al.  Position emission tomography with or without computed tomography in the primary staging of Hodgkin's lymphoma. , 2006, Haematologica.

[39]  Jared A. Dunnmon,et al.  Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs. , 2019, Radiology.

[40]  T. El‐Galaly,et al.  PET/CT for Staging; Past, Present, and Future. , 2018, Seminars in nuclear medicine.

[41]  E. Borer,et al.  Soil net nitrogen mineralisation across global grasslands , 2019, Nature Communications.

[42]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[43]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[44]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[45]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[46]  Andrew Y. Ng,et al.  CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT , 2020, EMNLP.

[47]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[48]  Max A. Viergever,et al.  Deep Learning for Multi-Task Medical Image Segmentation in Multiple Modalities , 2016, MICCAI.