Oversampling Algorithm Based on Spatial Distribution of Data Sets for Imbalance Learning

Imbalance problem is widespread in machine learning. Most learning algorithms can’t get satisfied performance when they are applied on imbalance data sets, because they can be deteriorated by this problem easily. This paper proposed SDSMOTE method which captures the spatial distribution of imbalance data sets, and changes the tendency of learning algorithm by over sampling by oversampling according to the recognition difficulty. Experiments on 5 UCI data sets validate the effectiveness of this oversampling algorithm.

[1]  Shihai Wang,et al.  CMO-SMOTE: Misclassification Cost Minimization Oriented Synthetic Minority Oversampling Technique for Imbalanced Learning , 2016, 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC).

[2]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[3]  Daniel S. Yeung,et al.  Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems , 2015, IEEE Transactions on Cybernetics.

[4]  Hewijin Christine Jiau,et al.  Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem , 2006 .

[5]  BingHao Yan,et al.  A novel region adaptive SMOTE algorithm for intrusion detection on imbalanced problem , 2017, 2017 3rd IEEE International Conference on Computer and Communications (ICCC).

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  David Zhang,et al.  Evolutionary Cost-Sensitive Extreme Learning Machine , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Scott Dick,et al.  Comparing nearest-neighbour search strategies in the SMOTE algorithm , 2006, Canadian Journal of Electrical and Computer Engineering.

[9]  Yi-Hung Liu,et al.  Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines , 2007, IEEE Transactions on Neural Networks.

[10]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[11]  Hendry,et al.  Deep learning to predict user rating in imbalance classification data incorporating ensemble methods , 2018, 2018 IEEE International Conference on Applied System Invention (ICASI).

[12]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[13]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[14]  Zhang Chunkai,et al.  A new sampling approach for classification of imbalanced data sets with high density , 2014, 2014 International Conference on Big Data and Smart Computing (BIGCOMP).

[15]  Dong Yue,et al.  Prediction of wind turbine blades icing based on MBK-SMOTE and random forest in imbalanced data set , 2017, 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2).