CheXseen: Unseen Disease Detection for Deep Learning Interpretation of Chest X-rays

We systematically evaluate the performance of deep learning models in the presence of diseases not labeled for or present during training. First, we evaluate whether deep learning models trained on a subset of diseases (seen diseases) can detect the presence of any one of a larger set of diseases. We find that models tend to falsely classify diseases outside of the subset (unseen diseases) as “no disease”. Second, we evaluate whether models trained on seen diseases can detect seen diseases when co-occurring with diseases outside the subset (unseen diseases). We find that models are still able to detect seen diseases even when co-occurring with unseen diseases. Third, we evaluate whether feature representations learned by models may be used to detect the presence of unseen diseases given a small labeled set of unseen diseases. We find that the penultimate layer of the deep neural network provides useful features for unseen disease detection. Our results can inform the safe clinical deployment of deep learning models trained on a non-exhaustive set of disease classes.

[1]  Magda Tsolaki,et al.  The reliability of a deep learning model in clinical out-of-distribution MRI data: a multicohort study , 2020, Medical Image Anal..

[2]  Terrance E. Boult,et al.  Towards Open Set Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[4]  Nicu Sebe,et al.  Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.

[5]  E PIRASTU,et al.  [ON RIB FRACTURES]. , 1963, Rassegna medica sarda.

[6]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Krit Pongpirul,et al.  Deep Learning for Automated Classification of Tuberculosis-Related Chest X-Ray: Dataset Specificity Limits Diagnostic Performance Generalizability , 2018, ArXiv.

[8]  Yifan Yu,et al.  CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison , 2019, AAAI.

[9]  David Sontag,et al.  Consistent Estimators for Learning to Defer to an Expert , 2020, ICML.

[10]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[11]  Joseph Paul Cohen,et al.  A Benchmark of Medical Out of Distribution Detection , 2020, ArXiv.

[12]  Rodrigo C. Barros,et al.  Can we trust deep learning models diagnosis? The impact of domain shift in chest radiograph classification , 2019, TIA@MICCAI.

[13]  Ronald M. Summers,et al.  ChestX-ray: Hospital-Scale Chest X-ray Database and Benchmarks on Weakly Supervised Classification and Localization of Common Thorax Diseases , 2019, Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics.

[14]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[17]  Songcan Chen,et al.  Recent Advances in Open Set Recognition: A Survey , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Douglas G Altman,et al.  How to obtain the P value from a confidence interval , 2011, BMJ : British Medical Journal.

[19]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[20]  Andrew Y. Ng,et al.  CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting , 2020, ArXiv.

[21]  David Sontag,et al.  Open Set Medical Diagnosis , 2019, ArXiv.

[22]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.