MIAC: Mutual-Information Classifier with ADASYN for Imbalanced Classification

currently, classification of imbalanced data is a significant issue in the area of data mining and machine learning because of the imbalance of most of the data set. An effective solution of this problem is Cost-Sensitive Learning (CSL), but when the costs are not given, this method cannot work property. As a Cost-Free Learning (CFL) method, Mutual-Information Classification (MIC) can obtain the optimal classification results when the cost information is not given. But this method emphasizes the data of minority class too much and neglects the accuracy of the classification of majority class. And based on the above, this paper presented a CFL method called Mutual-Information-ADASYN Classification (MIAC). Firstly, we get the abstaining samples which are hard to be classified by using MIC. Then we use these abstention samples to synthesize new instance by using the method of ADASYN. Thirdly, we build Mutual- Information-ADASYN Classification using the new samples. Finally, we use our classifier to get the final results. We evaluated the performance of MIAC on several imbalance binary datasets with different imbalance ratios. The experimental results indicate that the MIAC is more effective than MIC on dealing with imbalanced datasets.

[1]  Tadeusz Pietraszek,et al.  On the use of ROC analysis for the optimization of abstaining classifiers , 2007, Machine Learning.

[2]  Ran He,et al.  Information-Theoretic Measures for Objective Evaluation of Classifications , 2011, ArXiv.

[3]  Jerzy Stefanowski,et al.  Abstaining in rule set bagging for imbalanced data , 2015, Log. J. IGPL.

[4]  Tadeusz Pietraszek,et al.  Optimizing abstaining classifiers using ROC analysis , 2005, ICML.

[5]  Gerald Schaefer,et al.  Cost-sensitive texture classification , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[6]  Zhen Zhang,et al.  Classification model based on mutual information: Classification model based on mutual information , 2012 .

[7]  Björn E. Ottersten,et al.  Example-dependent cost-sensitive decision trees , 2015, Expert Syst. Appl..

[8]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[9]  Bao-Gang Hu,et al.  A New Strategy of Cost-Free Learning in the Class Imbalance Problem , 2014, IEEE Transactions on Knowledge and Data Engineering.

[10]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[11]  Qiang Yang,et al.  Test-cost sensitive naive Bayes classification , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  Y. Wang,et al.  Evaluation Criteria Based on Mutual Information for Classifications Including Rejected Class , 2008 .

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  John W. Fisher,et al.  Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[16]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[17]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  胡包钢,et al.  关于互信息准则在分类(包括拒识类别)问题中的应用 , 2009 .

[19]  Nitesh V. Chawla,et al.  Learning from Imbalanced Data: Evaluation Matters , 2012 .

[20]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[21]  Changyin Sun,et al.  Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data , 2015, Knowl. Based Syst..

[22]  Yuehui Chen,et al.  A new approach for imbalanced data classification based on data gravitation , 2014, Inf. Sci..