Two-Stream Deep Feature Modelling for Automated Video Endoscopy Data Analysis

Automating the analysis of imagery of the Gastrointestinal (GI) tract captured during endoscopy procedures has substantial potential benefits for patients, as it can provide diagnostic support to medical practitioners and reduce mistakes via human error. To further the development of such methods, we propose a two-stream model for endoscopic image analysis. Our model fuses two streams of deep feature inputs by mapping their inherent relations through a novel relational network model, to better model symptoms and classify the image. In contrast to handcrafted feature-based models, our proposed network is able to learn features automatically and outperforms existing state-of-the-art methods on two public datasets: KVASIR and Nerthus. Our extensive evaluations illustrate the importance of having two streams of inputs instead of a single stream and also demonstrates the merits of the proposed relational network architecture to combine those streams.

[1]  Mathias Lux,et al.  An Inception-like CNN Architecture for GI Disease and Anatomical Landmark Classification , 2017, MediaEval.

[2]  Sridha Sridharan,et al.  Predicting the Future: A Jointly Learnt Model for Action Anticipation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[4]  Muhammad Atif Tahir,et al.  Ensemble of Texture Features for Finding Abnormalities in the Gastro-Intestinal Tract , 2017, MediaEval.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[8]  Xinge You,et al.  Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition , 2018, ECCV.

[9]  Zhonglei Gu,et al.  HKBU at MediaEval 2017 - Medico: Medical Multimedia Task , 2017, MediaEval.

[10]  Rahul Gupta,et al.  On Evaluating CNN Representations for Low Resource Medical Image Classification , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Nitin Kumar,et al.  Kernel Generalized-Gaussian Mixture Model for Robust Abnormality Detection , 2017, MICCAI.

[12]  Michael Riegler,et al.  KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection , 2017, MMSys.

[13]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sridha Sridharan,et al.  Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Michael Riegler,et al.  Multimedia for Medicine: The Medico Task at MediaEval 2017 , 2017, MediaEval.

[16]  Xin Wang,et al.  Retinal Abnormalities Recognition Using Regional Multitask Learning , 2019, MICCAI.

[17]  Saurabh Sahu,et al.  SCL-UMD at the Medico Task-MediaEval 2017: Transfer Learning based Classification of Medical Images , 2017, MediaEval.

[18]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Pål Halvorsen,et al.  Automatic Hyperparameter Optimization for Transfer Learning on Medical Image Datasets Using Bayesian Optimization , 2019, 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT).

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Michael Riegler,et al.  Nerthus: A Bowel Preparation Quality Video Dataset , 2017, MMSys.

[22]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[23]  Sridha Sridharan,et al.  Forecasting Future Action Sequences with Neural Memory Networks , 2019, BMVC.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yixuan Yuan,et al.  Triple ANet: Adaptive Abnormal-aware Attention Network for WCE Image Classification , 2019, MICCAI.