An empirical study on the effect of imbalanced data on bleeding detection in endoscopic video

In biomedical applications including classification of endoscopic videos, class imbalance is a common problem arising from the significant difference between the prior probabilities of different classes. In this paper, we investigate the performance of different classifiers for varying training data distribution in case of bleeding detection problem through three experiments. In the first experiment, we analyze the classifier performance for different class distribution with a fixed sized training dataset. The experiment provides the indication of the required class distribution for optimum classification performance. In the second and third experiments, we investigate the effect of both training data size and class distribution on the classification performance. From our experiments, we found that a larger dataset with moderate class imbalance yields better classification performance compared to a small dataset with balanced distribution. Ensemble classifiers are more robust to the variation in training dataset compared to single classifier.

[1]  Khan A. Wahid,et al.  Automated Bleeding Detection in Capsule Endoscopy Videos Using Statistical Features and Region Growing , 2014, Journal of Medical Systems.

[2]  Joydeep Ghosh,et al.  Ensembles of $({\alpha})$-Trees for Imbalanced Classification Problems , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  D. Iakovidis,et al.  Software for enhanced video capsule endoscopy: challenges for essential progress , 2015, Nature Reviews Gastroenterology &Hepatology.

[4]  P. Swain,et al.  Wireless capsule endoscopy. , 2002, The Israel Medical Association journal : IMAJ.

[5]  Guozheng Yan,et al.  Bleeding Detection in Wireless Capsule Endoscopy Based on Probabilistic Neural Network , 2011, Journal of Medical Systems.

[6]  Max Q.-H. Meng,et al.  Polyp classification based on Bag of Features and saliency in wireless capsule endoscopy , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Bill Buckles,et al.  Bleeding detection from capsule endoscopy videos , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[9]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Khan A. Wahid,et al.  Performance assessment of a bleeding detection algorithm for endoscopic video based on classifier fusion method and exhaustive feature selection , 2018, Biomed. Signal Process. Control..

[12]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[13]  Wei Liu,et al.  Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets , 2011, PAKDD.

[14]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[15]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[16]  Khan A. Wahid,et al.  Application of modified ant colony optimization for computer aided bleeding detection system , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[17]  Khan A. Wahid,et al.  Learning from imbalanced data: A comprehensive comparison of classifier performance for bleeding detection in endoscopic video , 2016, 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV).

[18]  A. Karargyris,et al.  Small-bowel capsule endoscopy: a ten-point contemporary review. , 2013, World journal of gastroenterology.

[19]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..