Fuzzy-Based Feature and Instance Recovery

The severe class distribution shews the presence of under-represented data, which has great effects on the performance of learning algorithm, is still a challenge of data mining and machine learning. Lots of researches currently focus on experimental comparison of the existing re-sampling approaches. We believe it requires new ways of constructing better algorithms to further balance and analyse the data set. This paper presents a Fuzzy-based Information Decomposition oversampling (FIDoS) algorithm used for handling the imbalanced data. Generally speaking, this is a new way of addressing imbalanced learning problems from missing data perspective. First, we assume that there are missing instances in the minority class that result in the imbalanced dataset. Then the proposed algorithm which takes advantages of fuzzy membership function is used to transfer information to the missing minority class instances. Finally, the experimental results demonstrate that the proposed algorithm is more practical and applicable compared to sampling techniques.

[1]  Sebastián Ventura,et al.  Weighted Data Gravitation Classification for Standard and Imbalanced Data , 2013, IEEE Transactions on Cybernetics.

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[5]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[6]  Xin Yao,et al.  Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[9]  Huang Chong-fu,et al.  Demonstration of benefit of information distribution for probability estimation , 2000 .

[10]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[11]  M. Mostafizur Rahman,et al.  Cluster Based Under-Sampling for Unbalanced Cardiovascular Data , 2013 .

[12]  Paul M. Thompson,et al.  Analysis of sampling techniques for imbalanced data: An n=648 ADNI study , 2014, NeuroImage.

[13]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[14]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[15]  Min Gan,et al.  Information-decomposition-model-based missing value estimation for not missing at random dataset , 2015, International Journal of Machine Learning and Cybernetics.

[16]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..