Pruning support vectors for imbalanced data classification

In many practical applications, learning from imbalanced data poses a significant challenge that is increasingly faced by the machine learning community. The class imbalance problem raises issues that are either nonexistent or less severe compared to balanced class cases. This paper presents a new method for imbalanced data classification. The proposed method is based on support vector machine classifiers and backward pruning technique. The experimental results obtained on two data sets demonstrate the effectiveness of the new algorithm.

[1]  David Casasent,et al.  New training strategies for RBF neural networks for X-ray agricultural product inspection , 2003, Pattern Recognit..

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[4]  David Casasent,et al.  Feature selection from high-dimensional hyperspectral and polarimetric data for target detection , 2004, SPIE Defense + Commercial Sensing.

[5]  David Casasent,et al.  Feature reduction and morphological processing for hyperspectral image data. , 2004, Applied optics.

[6]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[7]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[8]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[9]  Floor Verdenius,et al.  A Method for Inductive Cost Optimization , 1991, EWSL.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[12]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[13]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[14]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[17]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[18]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[19]  Mark Craven The genomics of a signaling pathway: a KDD Cup challenge task , 2002, SKDD.