Direct Uncertainty Prediction for Medical Second Opinions

The issue of disagreements amongst human experts is a ubiquitous one in both machine learning and medicine. In medicine, this often corresponds to doctor disagreements on a patient diagnosis. In this work, we show that machine learning models can be trained to give uncertainty scores to data instances that might result in high expert disagreements. In particular, they can identify patient cases that would benefit most from a medical second opinion. Our central methodological finding is that Direct Uncertainty Prediction (DUP), training a model to predict an uncertainty score directly from the raw patient features, works better than Uncertainty Via Classification, the two-step process of training a classifier and postprocessing the output distribution to give an uncertainty score. We show this both with a theoretical result, and on extensive evaluations on a large scale medical imaging application.

[1]  Jonathan Krause,et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy , 2017, Ophthalmology.

[2]  J. Elmore,et al.  Diagnostic concordance among pathologists interpreting breast biopsy specimens. , 2015, JAMA.

[3]  Vernon H. Hoeppner,et al.  Toman's Tuberculosis--Case Detection, Treatment, and Monitoring: Questions and Answers , 2005 .

[4]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[5]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[6]  Thomas Beckman,et al.  Extent of diagnostic agreement among medical referrals , 2017, Journal of evaluation in clinical practice.

[7]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[8]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[9]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[10]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[11]  R Varma,et al.  Agreement among optometrists, ophthalmologists, and residents in evaluating the optic disc for glaucoma. , 1994, Ophthalmology.

[12]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[13]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[14]  Gregory S. Corrado,et al.  Deep learning for predicting refractive error from retinal fundus images , 2017, Investigative ophthalmology & visual science.

[15]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  H. Ahsan Diabetic retinopathy--biomolecules and multiple pathophysiology. , 2015, Diabetes & metabolic syndrome.

[17]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[18]  T. Frieden,et al.  Toman's Tuberculosis: Case Detection, Treatment and Monitoring: Questions and Answers , 2004 .

[19]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[20]  Fei-Fei Li,et al.  Attribute Learning in Large-Scale Datasets , 2010, ECCV Workshops.

[21]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Who Said What: Modeling Individual Labelers Improves Classification , 2017, AAAI.

[23]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Antonio Criminisi,et al.  Bayesian Image Quality Transfer with CNNs: Exploring Uncertainty in dMRI Super-Resolution , 2017, MICCAI.

[25]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[26]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[27]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[28]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[29]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[30]  S. Haneda,et al.  [International clinical diabetic retinopathy disease severity scale]. , 2010, Nihon rinsho. Japanese journal of clinical medicine.

[31]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[32]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[33]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[34]  Margrit Betke,et al.  Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s) , 2017, International Journal of Computer Vision.

[35]  Geraint Rees,et al.  Clinically applicable deep learning for diagnosis and referral in retinal disease , 2018, Nature Medicine.

[36]  Christopher D. Manning,et al.  On-the-Job Learning with Bayesian Decision Theory , 2015, NIPS.