EasyEnsemble and Feature Selection for Imbalance Data Sets

There are many labeled data sets which have an unbalancedrepresentation among the classes in them. When the imbalance islarge, classification accuracy on the smaller class tends to belower. In particular, when a class is of great interest but occursrelatively rarely such as cases of fraud, instances of disease, andso on, it is important to accurately identify it. Here we propose a novel algorithm named MIEE(Mutual Information based feature selection for EasyEnsemble) totreat this problem and improve generalization performance of theEasyEnsemble classifier. Experimental results on the UCI data setsshow that MIEE obtain better performance, compared with theasymmetric bagging and EasyEnsemble.

[1]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[2]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[5]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[10]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[11]  Jack Y. Yang,et al.  Asymmetric bagging and feature selection for activities prediction of drug molecules , 2008, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[12]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[13]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.