An intelligent undersampling technique based upon intuitionistic fuzzy sets to alleviate class imbalance problem of classification with noisy environment

Traditional classification algorithms (TCA) do not work with the unequal class sizes. There are applications wherein the requirement is to discover the exceptional/rare cases such as frauds in credit card database or fraudulent mobile calls, etc. TCA, when applied in such cases, failed to detect rare cases. This is stated as the problem of imbalance classes. The problem is more serious when TCA are applied on the data distribution having other impurities like noise, overlapping classes and imbalance within classes. This paper presented an intelligent undersampling and ensemble based classification method to resolve the problem of imbalanced classes in noisy situation. A synthetic dataset with different extent of noise is used to assess the classification performance of the proposed techniques. The results indicate that the presented undersampling and ensemble based classifier techniques has better classification performance in noisy situation when we compare them with RUS and SMOTE having classifiers like C4.5, RIPPLE, KNN, SVM, MLP, NaiveBayes and with the ensemble techniques like boosting, bagging and randomforest.

[1]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  María José del Jesús,et al.  Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets , 2009, Int. J. Approx. Reason..

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Krassimir T. Atanassov,et al.  Intuitionistic fuzzy sets: past, present and future , 2003, EUSFLAT Conf..

[6]  Szymon Wilk,et al.  Learning from Imbalanced Data in Presence of Noisy and Borderline Examples , 2010, RSCTC.

[7]  Francisco Herrera,et al.  Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering , 2014, IDEAL.

[8]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[9]  Tamalika Chaira,et al.  A novel intuitionistic fuzzy C means clustering algorithm and its application to medical images , 2011, Appl. Soft Comput..

[10]  Ahmad Taher Azar,et al.  Superior neuro-fuzzy classification systems , 2013, Neural Computing and Applications.

[11]  Hongyuan Wang,et al.  New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification , 2014, TheScientificWorldJournal.

[12]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[13]  Francisco Herrera,et al.  Dynamic classifier selection for One-vs-One strategy: Avoiding non-competent classifiers , 2013, Pattern Recognit..

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Taghi M. Khoshgoftaar,et al.  Evolutionary Sampling and Software Quality Modeling of High-Assurance Systems , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[16]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[17]  Dae-Ki Kang,et al.  Geometric Mean based Boosting Algorithm to Resolve Data Imbalance Problem , 2013, PACIS.

[18]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[19]  Ping Zhong,et al.  Learning SVM with weighted maximum margin criterion for classification of imbalanced data , 2011, Math. Comput. Model..

[20]  Nathalie Japkowicz,et al.  Boosting support vector machines for imbalanced data sets , 2008, Knowledge and Information Systems.

[21]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[22]  Chao-Ton Su,et al.  An Evaluation of the Robustness of MTS for Imbalanced Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[23]  H. Kashima,et al.  Roughly balanced bagging for imbalanced data , 2009 .

[24]  Steven Salzberg,et al.  A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features , 2004, Machine Learning.

[25]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[26]  José Martínez Sotoca,et al.  Combined Effects of Class Imbalance and Class Overlap on Instance-Based Classification , 2006, IDEAL.

[27]  Anjana Gosain,et al.  A density oriented fuzzy C-means clustering algorithm for recognising original cluster shapes from noisy data , 2011 .

[28]  Francisco Herrera,et al.  Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems , 2009, Appl. Soft Comput..

[29]  Anjana Gosain,et al.  Robust kernelized approach to clustering by incorporating new distance measure , 2013, Eng. Appl. Artif. Intell..

[30]  Szymon Wilk,et al.  Selective Pre-processing of Imbalanced Data for Improving Classification Performance , 2008, DaWaK.

[31]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[32]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[33]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[34]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[35]  Quanmin Zhu,et al.  Complex System Modelling and Control Through Intelligent Soft Computations , 2016, Studies in Fuzziness and Soft Computing.

[36]  Yang Yong,et al.  The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm , 2012 .

[37]  Aboul Ella Hassanien,et al.  Dimensionality reduction of medical big data using neural-fuzzy classifier , 2014, Soft Computing.

[38]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[39]  Prabhjot Kaur,et al.  Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise , 2018 .