An Extensive Study on Cross-Dataset Bias and Evaluation Metrics Interpretation for Machine Learning Applied to Gastrointestinal Tract Abnormality Classification

Precise and efficient automated identification of gastrointestinal (GI) tract diseases can help doctors treat more patients and improve the rate of disease detection and identification. Currently, automatic analysis of diseases in the GI tract is a hot topic in both computer science and medical-related journals. Nevertheless, the evaluation of such an automatic analysis is often incomplete or simply wrong. Algorithms are often only tested on small and biased datasets, and cross-dataset evaluations are rarely performed. A clear understanding of evaluation metrics and machine learning models with cross datasets is crucial to bring research in the field to a new quality level. Toward this goal, we present comprehensive evaluations of five distinct machine learning models using global features and deep neural networks that can classify 16 different key types of GI tract conditions, including pathological findings, anatomical landmarks, polyp removal conditions, and normal findings from images captured by common GI tract examination instruments. In our evaluation, we introduce performance hexagons using six performance metrics, such as recall, precision, specificity, accuracy, F1-score, and the Matthews correlation coefficient to demonstrate how to determine the real capabilities of models rather than evaluating them shallowly. Furthermore, we perform cross-dataset evaluations using different datasets for training and testing. With these cross-dataset evaluations, we demonstrate the challenge of actually building a generalizable model that could be used across different hospitals. Our experiments clearly show that more sophisticated performance metrics and evaluation methods need to be applied to get reliable models rather than depending on evaluations of the splits of the same dataset—that is, the performance metrics should always be interpreted together rather than relying on a single metric.

[1]  Mingda Zhou,et al.  Polyp detection and radius measurement in small intestine using video capsule endoscopy , 2014, 2014 7th International Conference on Biomedical Engineering and Informatics.

[2]  Luís A. Alexandre,et al.  Color and Position versus Texture Features for Endoscopic Polyp Detection , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[3]  Michael Riegler,et al.  Using Preprocessing as a Tool in Medical Image Detection , 2018, MediaEval.

[4]  Michael Riegler,et al.  Medico Multimedia Task at MediaEval 2018 , 2018, MediaEval.

[5]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Michael Riegler,et al.  A Comparison of Deep Learning with Global Features for Gastrointestinal Disease Detection , 2017, MediaEval.

[8]  Mathias Lux,et al.  The 2018 Medico Multimedia Task Submission of Team NOAT Using Neural Network Features and Search-based Classification , 2018, MediaEval.

[9]  Yuichi Mori,et al.  Detecting colorectal polyps via machine learning , 2018, Nature Biomedical Engineering.

[10]  Michael Riegler,et al.  KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection , 2017, MMSys.

[11]  Michael Riegler,et al.  Nerthus: A Bowel Preparation Quality Video Dataset , 2017, MMSys.

[12]  Alexei A. Efros,et al.  Undoing the Damage of Dataset Bias , 2012, ECCV.

[13]  Jens Forster,et al.  Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge , 2009, KDD Cup.

[14]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[15]  Xiaoyi Jiang,et al.  AUTOMATIC DETECTION OF COLORECTAL POLYPS IN STATIC IMAGES , 2011 .

[16]  Michael Riegler,et al.  LIRE: open source visual information retrieval , 2016, MMSys.

[17]  T. Berzin,et al.  Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy , 2018, Nature Biomedical Engineering.

[18]  Michael Riegler,et al.  From Annotation to Computer-Aided Diagnosis , 2017, ACM Trans. Multim. Comput. Commun. Appl..

[19]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ilangko Balasingham,et al.  Comparison of hand-craft feature based SVM and CNN based deep learning framework for automatic polyp classification , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Xavier Dray,et al.  Polyp Detection Benchmark in Colonoscopy Videos using GTCreator: A Novel Fully Configurable Tool for Easy and Fast Annotation of Image Databases , 2018 .

[23]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[24]  Jung-Hwan Oh,et al.  Polyp-Alert: Near real-time feedback during colonoscopy , 2015, Comput. Methods Programs Biomed..

[25]  Michael Riegler,et al.  Automatic Hyperparameter Optimization in Keras for the MediaEval 2018 Medico Multimedia Task , 2018, MediaEval.

[26]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[27]  Fernando Vilariño,et al.  Towards automatic polyp detection with a polyp appearance model , 2012, Pattern Recognit..

[28]  Michael Riegler,et al.  Transfer Learning with Prioritized Classification and Training Dataset Equalization for Medical Objects Detection , 2018, MediaEval.

[29]  Michael Riegler,et al.  Deep Learning and Hand-Crafted Feature Based Approaches for Polyp Detection in Medical Videos , 2018, 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Danielle Dias,et al.  Transfer Learning with CNN Architectures for Classifying Gastrointestinal Diseases and Anatomical Landmarks , 2018, MediaEval.

[32]  Nima Tajbakhsh,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.

[33]  Dimitris A. Karras,et al.  Computer-aided tumor detection in endoscopic video using color wavelet features , 2003, IEEE Transactions on Information Technology in Biomedicine.

[34]  Jung-Hwan Oh,et al.  Polyp Detection in Colonoscopy Video using Elliptical Shape Feature , 2007, 2007 IEEE International Conference on Image Processing.

[35]  Michael Riegler,et al.  Efficient disease detection in gastrointestinal videos – global features versus neural networks , 2017, Multimedia Tools and Applications.

[36]  Dimitrios K. Iakovidis,et al.  A comparative study of texture features for the discrimination of gastric polyps in endoscopic video , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[37]  D. Lieberman Quality and colonoscopy: a new imperative. , 2005, Gastrointestinal endoscopy.

[38]  Isabel N. Figueiredo,et al.  Automated Polyp Detection in Colon Capsule Endoscopy , 2013, IEEE Transactions on Medical Imaging.

[39]  Max Q.-H. Meng,et al.  Automatic Polyp Detection via a Novel Unified Bottom-Up and Top-Down Saliency Approach , 2018, IEEE Journal of Biomedical and Health Informatics.

[40]  Vinh-Tiep Nguyen,et al.  An Application of Residual Network and Faster - RCNN for Medico: Multimedia Task at MediaEval 2018 , 2018, MediaEval.

[41]  Yuji Iwahori,et al.  Automatic Polyp Detection in Endoscope Images Using a Hessian Filter , 2013, MVA.

[42]  Zhonglei Gu,et al.  Weighted Discriminant Embedding: Discriminant Subspace Learning for Imbalanced Medical Data Classification , 2018, MediaEval.

[43]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[44]  Nima Tajbakhsh,et al.  Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks , 2015, 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).

[45]  H. Duan,et al.  Gastric precancerous diseases classification using CNN with a concise model , 2017, PloS one.

[46]  Saurabh Sahu,et al.  SCL-UMD at the Medico Task-MediaEval 2017: Transfer Learning based Classification of Medical Images , 2017, MediaEval.

[47]  Thomas de Lange,et al.  Kvasir-SEG: A Segmented Polyp Dataset , 2019, MMM.

[48]  Max Q.-H. Meng,et al.  Gastrointestinal bleeding detection in wireless capsule endoscopy images using handcrafted and CNN features , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[49]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[50]  Muhammad Atif Tahir,et al.  Ensemble of Texture Features for Finding Abnormalities in the Gastro-Intestinal Tract , 2017, MediaEval.

[51]  Nima Tajbakhsh,et al.  Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information , 2016, IEEE Transactions on Medical Imaging.

[52]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[53]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[54]  Sun Young Park,et al.  A Colon Video Analysis Framework for Polyp Detection , 2012, IEEE Transactions on Biomedical Engineering.

[55]  Aymeric Histace,et al.  Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge , 2017, IEEE Transactions on Medical Imaging.

[56]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[57]  Antonio M. López,et al.  A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images , 2016, Journal of healthcare engineering.

[58]  Huilong Duan,et al.  Real-time gastric polyp detection using convolutional neural networks , 2019, PloS one.

[59]  N Segnan,et al.  European guidelines for quality assurance in colorectal cancer screening and diagnosis. First Edition – Executive summary , 2012, Endoscopy.

[60]  Klaus Schöffmann,et al.  Early and Late Fusion of Classifiers for the MediaEval Medico Task , 2018, MediaEval.

[61]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[62]  Max Q.-H. Meng,et al.  Tumor Recognition in Wireless Capsule Endoscopy Images Using Textural Features and SVM-Based Feature Selection , 2012, IEEE Transactions on Information Technology in Biomedicine.

[63]  Mathias Lux,et al.  An Inception-like CNN Architecture for GI Disease and Anatomical Landmark Classification , 2017, MediaEval.

[64]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[65]  Michael Riegler,et al.  Deep Learning Based Disease Detection Using Domain Specific Transfer Learning , 2018, MediaEval.

[66]  Fernando Vilariño,et al.  Impact of image preprocessing methods on polyp localization in colonoscopy frames , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[67]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[68]  Martha Larson,et al.  How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images , 2014, ACM Multimedia.

[69]  Steven Salzberg,et al.  Programs for Machine Learning , 2004 .

[70]  Fernando Vilariño,et al.  Texture-Based Polyp Detection in Colonoscopy , 2009, Bildverarbeitung für die Medizin.

[71]  Michael Riegler,et al.  Methodology to develop machine learning algorithms to improve performance in gastrointestinal endoscopy , 2018, World journal of gastroenterology.

[72]  Michael Riegler,et al.  Multimedia and Medicine: Teammates for Better Disease Detection and Survival , 2016, ACM Multimedia.

[73]  Aymeric Histace,et al.  Towards Real-Time Polyp Detection in Colonoscopy Videos: Adapting Still Frame-Based Methodologies for Video Sequences Analysis , 2017, CARE/CLIP@MICCAI.

[74]  Muhammad Atif Tahir,et al.  Majority Voting of Heterogeneous Classifiers for Finding Abnormalities in the Gastro-Intestinal Tract , 2018, MediaEval.

[75]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[76]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[77]  A. M. Leufkens,et al.  Factors influencing the miss rate of polyps in a back-to-back colonoscopy study , 2012, Endoscopy.

[78]  Michael Riegler,et al.  The Medico-Task 2018: Disease Detection in the Gastrointestinal Tract Using Global Features and Deep Learning , 2018, MediaEval.

[79]  Fernando Vilariño,et al.  WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians , 2015, Comput. Medical Imaging Graph..

[80]  Hao Chen,et al.  Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos , 2017, IEEE Journal of Biomedical and Health Informatics.

[81]  N. Segnan,et al.  European guidelines for quality assurance in colorectal cancer screening and diagnosis. First Edition – Principles of evidence assessment and methods for reaching recommendations , 2012, Endoscopy.

[82]  Michael Riegler,et al.  Multimedia for Medicine: The Medico Task at MediaEval 2017 , 2017, MediaEval.

[83]  Jung-Hwan Oh,et al.  Part-Based Multiderivative Edge Cross-Sectional Profiles for Polyp Detection in Colonoscopy , 2014, IEEE Journal of Biomedical and Health Informatics.