UNCERTAINTY MODELING AND INTERPRETABILITY IN CONVOLUTIONAL NEURAL NETWORKS FOR POLYP SEGMENTATION

Convolutional Neural Networks (CNNs) are propelling advances in a range of different computer vision tasks such as object detection and object segmentation. Their success has motivated research in applications of such models for medical image analysis. If CNN-based models are to be helpful in a medical context, they need to be precise, interpretable, and uncertainty in predictions must be well understood. In this paper, we develop and evaluate recent advances in uncertainty estimation and model interpretability in the context of semantic segmentation of polyps from colonoscopy images. We evaluate and enhance several architectures of Fully Convolutional Networks (FCNs) for semantic segmentation of colorectal polyps and provide a comparison between these models. Our highest performing model achieves a 76.06% mean IOU accuracy on the EndoScene dataset, a considerable improvement over the previous state-of-the-art.

[1]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[2]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[3]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[6]  Yuji Iwahori,et al.  Automatic Polyp Detection using DSC Edge Detector and HOG Features , 2014, ICPRAM.

[7]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[8]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[9]  Yixin Zhong,et al.  Statistical learning theory and state of the art in SVM , 2003, The Second IEEE International Conference on Cognitive Informatics, 2003. Proceedings..

[10]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[11]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[12]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Zhiping Lin,et al.  Liver tumor detection and segmentation using kernel-based extreme learning machine , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[15]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[16]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[18]  Jose C. Principe,et al.  Measures of Entropy From Data Using Infinitely Divisible Kernels , 2012, IEEE Transactions on Information Theory.

[19]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[20]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[21]  Michael Kampffmeyer,et al.  Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio , 2018, MICCAI.

[22]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[23]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  James Andrew Bagnell,et al.  Learning in modular systems , 2010 .

[26]  Joachim M. Buhmann,et al.  Crowdsourcing the creation of image segmentation algorithms for connectomics , 2015, Front. Neuroanat..

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[29]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[30]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[31]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[32]  Ying Wei,et al.  Computer-Aided Lung Nodule Recognition by SVM Classifier Based on Combination of Random Undersampling and SMOTE , 2015, Comput. Math. Methods Medicine.

[33]  G. Crooks On Measures of Entropy and Information , 2015 .

[34]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[35]  Fernando Vilariño,et al.  Polyp Segmentation Method in Colonoscopy Videos by Means of MSA-DOVA Energy Maps Calculation , 2014, CLIP@MICCAI.

[36]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[37]  Antonio M. López,et al.  A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images , 2016, Journal of healthcare engineering.

[38]  Qingmao Hu,et al.  A marker-based watershed method for X-ray image segmentation , 2014, Comput. Methods Programs Biomed..

[39]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[40]  José M. Bioucas-Dias,et al.  Segmentation and Detection of Colorectal Polyps Using Local Polynomial Approximation , 2012, ICIAR.

[41]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[42]  José Carlos Príncipe,et al.  Understanding Autoencoders with Information Theoretic Concepts , 2018, Neural Networks.

[43]  Luís A. Alexandre,et al.  Polyp Detection in Endoscopic Video Using SVMs , 2007, PKDD.

[44]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[45]  Xiaogang Wang,et al.  Medical image classification with convolutional neural network , 2014, 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV).

[46]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[48]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[49]  Fernando Vilariño,et al.  Towards automatic polyp detection with a polyp appearance model , 2012, Pattern Recognit..

[50]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[51]  R. Horn On infinitely divisible matrices, kernels, and functions , 1967 .

[52]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[53]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[56]  S. Theodoridis,et al.  Chapter 4 – Nonlinear Classifiers , 2009 .

[57]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[58]  K. Daoudi,et al.  State-of-the-art sequence kernels for SVM speaker verification , 2008, 2008 IEEE Workshop on Machine Learning for Signal Processing.

[59]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[61]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[62]  Charu C. Aggarwal,et al.  Neural Networks and Deep Learning , 2018, Springer International Publishing.

[63]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[65]  A. Jemal,et al.  Cancer statistics, 2017 , 2017, CA: a cancer journal for clinicians.

[66]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[67]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  P. Bossuyt,et al.  Polyp Miss Rate Determined by Tandem Colonoscopy: A Systematic Review , 2006, The American Journal of Gastroenterology.

[70]  Andreas Uhl,et al.  Colonic Polyp Classification with Convolutional Neural Networks , 2016, 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS).

[71]  Robert Jenssen,et al.  Mixture weight influence on kernel entropy component analysis and semi-supervised learning using the Lasso , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[72]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[73]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[74]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[75]  Robert Jenssen,et al.  Understanding Convolutional Neural Network Training with Information Theory , 2018, ArXiv.

[76]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Christian Osendorfer,et al.  NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations , 2018, NeurIPS.

[78]  Naren Ramakrishnan,et al.  Flow of Information in Feed-Forward Deep Neural Networks , 2016, ArXiv.

[79]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[81]  S. S. Salankar,et al.  MRI brain cancer classification using Support Vector Machine , 2014, 2014 IEEE Students' Conference on Electrical, Electronics and Computer Science.

[82]  Nilanjan Ray,et al.  Cell Detection with Deep Convolutional Neural Network and Compressed Sensing , 2017, ArXiv.

[83]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[84]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[85]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[86]  B. Møller,et al.  Cancer incidence, mortality, survival and prevalence in Norway , 2011 .

[87]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Fernando Vilariño,et al.  WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians , 2015, Comput. Medical Imaging Graph..

[89]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[90]  Hayit Greenspan,et al.  Fully Convolutional Network for Liver Segmentation and Lesions Detection , 2016, LABELS/DLMIA@MICCAI.

[91]  Qinghui Liu,et al.  Deep learning applied to automatic polyp detection in colonoscopy images : master thesis in System Engineering with Embedded Systems , 2017 .

[92]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[93]  A.Z. Kouzani,et al.  A random forest for lung nodule identification , 2008, TENCON 2008 - 2008 IEEE Region 10 Conference.

[94]  Max Q.-H. Meng,et al.  Polyp classification based on Bag of Features and saliency in wireless capsule endoscopy , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[95]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[96]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[97]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[98]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[99]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[100]  A. Jemal,et al.  Colorectal cancer statistics, 2017 , 2017, CA: a cancer journal for clinicians.

[101]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[102]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[103]  Irwin Sobel,et al.  An Isotropic 3×3 image gradient operator , 1990 .

[104]  Sergios Theodoridis,et al.  Chapter 2 – Classifiers Based on Bayes Decision Theory , 2006 .

[105]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[106]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.