Feature Selection in Imbalance data sets

Feature selection methods have been used these days in the various fields. Like information retrieval and filtering, text classification, risk management, web categorization, medical diagnosis and the detection of credit card fraud. In this paper we focus on feature selection for imbalanced problems. One of the greatest challenges in machine learning and data mining research is the class imbalance problems. Imbalance problems can appear in two different types of data sets: binary problems, where one of the two classes comprises considerably more samples than the other, and multiclass problems, where each class only contains a tiny fraction of the samples. In this paper we want to explain a prior knowledge for an expert system which can tell us which feature selection metrics perform best based on our data characteristics and regardless of the classifier used.

[1]  Francesco Bergadano,et al.  A Unifying Framework , 1995 .

[2]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[3]  M WeissGary Mining with rarity , 2004 .

[4]  A. Hall Applied Optics. , 2022, Science.

[5]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[6]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[7]  David Casasent,et al.  Feature reduction and morphological processing for hyperspectral image data. , 2004, Applied optics.

[8]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[11]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[12]  Tian-Yu Liu,et al.  EasyEnsemble and Feature Selection for Imbalance Data Sets , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[13]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Xue-wen Chen,et al.  Pruning support vectors for imbalanced data classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[15]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[16]  Xue-wen Chen,et al.  Combating the Small Sample Class Imbalance Problem Using Feature Selection , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.