On Modelling Label Uncertainty in Deep Neural Networks: Automatic Estimation of Intra- Observer Variability in 2D Echocardiography Quality Assessment

Uncertainty of labels in clinical data resulting from intra-observer variability can have direct impact on the reliability of assessments made by deep neural networks. In this paper, we propose a method for modelling such uncertainty in the context of 2D echocardiography (echo), which is a routine procedure for detecting cardiovascular disease at point-of-care. Echo imaging quality and acquisition time is highly dependent on the operator’s experience level. Recent developments have shown the possibility of automating echo image quality quantification by mapping an expert’s assessment of quality to the echo image via deep learning techniques. Nevertheless, the observer variability in the expert’s assessment can impact the quality quantification accuracy. Here, we aim to model the intra-observer variability in echo quality assessment as an aleatoric uncertainty modelling regression problem with the introduction of a novel method that handles the regression problem with categorical labels. A key feature of our design is that only a single forward pass is sufficient to estimate the level of uncertainty for the network output. Compared to the 0.11 ± 0.09 absolute error (in a scale from 0 to 1) archived by the conventional regression method, the proposed method brings the error down to 0.09 ± 0.08, where the improvement is statistically significant and equivalents to 5.7% test accuracy improvement. The simplicity of the proposed approach means that it could be generalized to other applications of deep learning in medical imaging, where there is often uncertainty in clinical labels.

[1]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Christos P. Loizou,et al.  Quality evaluation of ultrasound imaging in the carotid artery based on normalization and speckle reduction filtering , 2006, Proceedings of the 12th IEEE Mediterranean Electrotechnical Conference (IEEE Cat. No.04CH37521).

[3]  Karl Thiele,et al.  Detection and display of acoustic window for guiding and training cardiac ultrasound users , 2014, Medical Imaging.

[4]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[5]  Jianxin Wu,et al.  Deep Label Distribution Learning With Label Ambiguity , 2016, IEEE Transactions on Image Processing.

[6]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[7]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[8]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[9]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[10]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[11]  Beata Beigman Klebanov,et al.  Learning with Annotation Noise , 2009, ACL.

[12]  Nassir Navab,et al.  Confidence-driven control of an ultrasound probe: Target-specific acoustic window optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[13]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[14]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[18]  Nassir Navab,et al.  Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control , 2018, NeuroImage.

[19]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[20]  Purang Abolmaesumi,et al.  Automatic Quality Assessment of Echocardiograms Using Convolutional Neural Networks: Feasibility on the Apical Four-Chamber View , 2017, IEEE Transactions on Medical Imaging.

[21]  Kevin McLaughlin,et al.  Focused Critical Care Echocardiography: Development and Evaluation of an Image Acquisition Assessment Tool* , 2016, Critical care medicine.

[22]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[23]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[24]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[25]  A. Silman,et al.  Statistical methods for assessing observer variability in clinical measures. , 1992, BMJ.

[26]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[27]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[28]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[29]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Use of Classification Algorithms in Noise Detection and Elimination , 2009, HAIS.

[30]  A. Weigend,et al.  Estimating the mean and variance of the target probability distribution , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[31]  Doina Precup,et al.  Exploring Uncertainty Measures in Deep Networks for Multiple Sclerosis Lesion Detection and Segmentation , 2018, MICCAI.

[32]  Jovan G. Brankov,et al.  Active learning for image quality assessment by model observer , 2014, 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI).

[33]  Francisco Javier Girón González-Torre,et al.  Misclassified multinomial data: a Bayesian approach , 2007 .

[34]  Dimitrios Karnabatidis,et al.  Multiresolution edge detection using enhanced fuzzy c-means clustering for ultrasound image speckle reduction. , 2014, Medical physics.

[35]  Shaohua Kevin Zhou,et al.  Learning the Manifold of Quality Ultrasound Acquisition , 2013, MICCAI.

[36]  Bram van Ginneken,et al.  Automated segmentation of pulmonary structures in thoracic computed tomography scans: a review , 2013, Physics in medicine and biology.

[37]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[38]  Bin Yang,et al.  A Machine-learning framework for automatic reference-free quality assessment in MRI , 2018, Magnetic resonance imaging.

[39]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[40]  Christian Rosendal,et al.  Image Quality Influences the Assessment of Left Ventricular Function , 2014, Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine.

[41]  Purang Abolmaesumi,et al.  Quality Assessment of Echocardiographic Cine Using Recurrent Neural Networks: Feasibility on Five Standard View Planes , 2017, MICCAI.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  E A Geiser,et al.  Guidelines for cardiac sonographer education: recommendations of the American Society of Echocardiography Sonographer Training and Education Committee. , 2001, Journal of the American Society of Echocardiography : official publication of the American Society of Echocardiography.

[44]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45]  Purang Abolmaesumi,et al.  Cardiac Phase Detection in Echocardiograms With Densely Gated Recurrent Neural Networks and Global Extrema Loss , 2019, IEEE Transactions on Medical Imaging.

[46]  Dong Ni,et al.  FUIQA: Fetal Ultrasound Image Quality Assessment With Deep Convolutional Networks , 2017, IEEE Transactions on Cybernetics.

[47]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[48]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[49]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[50]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[51]  Siegfried Wahl,et al.  Leveraging uncertainty information from deep neural networks for disease detection , 2016, Scientific Reports.

[52]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .