Improved automatic filtering algorithm for imbalanced classification based on SVM-RFE

Almost all unbalanced classification algorithms focus on how to maximize the balance degree of the data set, which means to remove those negative samples that are useless for classifier training while keeping the positive samples and useful samples as many as possible. However, we find that the best balance degree is not necessary with the highest classification accuracy. In this paper, we propose a new method for imbalanced classification combined the SVM-REF (Support Vector Machine Recursive Feature Elimination) with automatic filtering algorithm. First, the SVM-RFE is applied to select the most discrimination features. Second, the combination of these features are used in the automatic filtering algorithm to extract the filtering rules, which will remove the samples that have no or negative effects on classifier training and testing on imbalanced data sets. Experimental results demonstrated that the proposed method can get higher classification accuracy. In addition, our approach can significantly shorten the training time.

[1]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[2]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[3]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[4]  Longin Jan Latecki,et al.  Improving SVM Classification on Imbalanced Data Sets in Distance Spaces , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5]  Xin Li,et al.  Protein classification with imbalanced data , 2007, Proteins.

[6]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[7]  Hong Gu,et al.  Imbalanced classification using support vector machine ensemble , 2011, Neural Computing and Applications.

[8]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[9]  Fabio Roli,et al.  Intrusion detection in computer networks by a modular ensemble of one-class classifiers , 2008, Inf. Fusion.

[10]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[11]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[13]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[14]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[15]  T. Warren Liao,et al.  Classification of weld flaws with imbalanced class data , 2008, Expert Syst. Appl..

[16]  Jieping Ye,et al.  A Small Sphere and Large Margin Approach for Novelty Detection Using Training Data with Outliers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Brian Litt,et al.  One-Class Novelty Detection for Seizure Analysis from Intracranial EEG , 2006, J. Mach. Learn. Res..

[18]  Jianping Fan,et al.  Automatic filtering algorithm for imbalanced classification , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.