Clustering algorithms on imbalanced data using the SMOTE technique for image segmentation

Imbalanced data is a critical problem in machine learning. Most imbalanced dataset consists of one or more classes, called the minority class, which do not have enough number of samples for the recognition. Many traditional classification algorithms are unable to recognize the minority class effectively. Clustering algorithms used for image segmentation may have a high accuracy; however, none of samples in the minority class is classified correctly. In this study, we use three approaches, traditional oversampling technique, traditional undersampling technique, and the Synthetic Minority Over-sampling Technique (SMOTE), to reduce the significant difference of imbalance of the number of samples between the majority classes and the minority classes in the dataset. Fuzzy C-means algorithm (FCM) and Possibilistic Clustering Algorithm (PCA) are used to segment the images in which the samples are generated using above sampling methods. Experimental results are evaluated using the Kappa Coefficient and Confusion matrix. Our evaluation shows that the oversampling, undersampling, and SMOTE techniques can improve the imbalanced image segmentation problem with a higher accuracy[1].

[1]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[2]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[3]  PATRICIA S. ABRIL Intro_ lo , 2006 .

[4]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[5]  Xia Li,et al.  Robustness of Quantitative Compressive Sensing MRI: The Effect of Random Undersampling Patterns on Derived Parameters for DCE- and DSC-MRI , 2012, IEEE Transactions on Medical Imaging.

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[8]  J. C. Noordam,et al.  Multivariate image segmentation with cluster size insensitive fuzzy C-means , 2002 .

[9]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Harriet Small,et al.  Handling Unbalanced Data in Deep Image Segmentation , 2017 .

[11]  Shuiping Gou,et al.  Classification of imbalanced hyperspectral imagery data using support vector sampling , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[12]  Zhong Liu,et al.  An adaptive resampling algorithm based on CFSFDP , 2017, 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA).

[13]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[14]  Minseok Kim,et al.  Synthetic minority over-sampling technique based on fuzzy c-means clustering for imbalanced data , 2017, 2017 International Conference on Fuzzy Theory and Its Applications (iFUZZY).

[15]  Po-Whei Huang,et al.  A size-insensitive integrity-based fuzzy c-means method for data clustering , 2014, Pattern Recognit..

[16]  James C. Bezdek,et al.  Partially supervised clustering for image segmentation , 1996, Pattern Recognit..

[17]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[18]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[19]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.