Majority-Class Aware Support Vector Domain Oversampling for Imbalanced Classification Problems

In this work, a method is presented to overcome the difficulties posed by imbalanced classification problems. The proposed algorithm fits a data description to the minority class but in contrast to many other algorithms, awareness of samples of the majority class is used to improve the estimation process. The majority samples are incorporated in the optimization procedure and the resulting domain descriptions are generally superior to those without knowledge about the majority class. Extensive experimental results support the validity of this approach.

[1]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[2]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[3]  Markus Kächele,et al.  Cascaded Fusion of Dynamic, Spatial, and Textural Feature Sets for Person-Independent Facial Emotion Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[4]  Gilles Cohen,et al.  Data Imbalance in Surveillance of Nosocomial Infections , 2003, ISMDA.

[5]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.

[6]  Günther Palm,et al.  On the discovery of events in EEG data utilizing information fusion , 2013, Comput. Stat..

[7]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[8]  Samia Boukir,et al.  Support Vectors Selection for Supervised Learning Using an Ensemble Approach , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Ji Gao,et al.  Improving SVM Classification with Imbalance Data Set , 2009, ICONIP.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Sheng Chen,et al.  A Kernel-Based Two-Class Classifier for Imbalanced Data Sets , 2007, IEEE Transactions on Neural Networks.

[12]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[13]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[16]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  Minqiang Li,et al.  Candidate Vectors Selection for Training Support Vector Machines , 2007, Third International Conference on Natural Computation (ICNC 2007).

[19]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[20]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[21]  Xiao-Ping Zhang,et al.  Advances in Intelligent Computing, International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I , 2005, ICIC.

[22]  Victor Maojo,et al.  Medical Data Analysis , 2001, Lecture Notes in Computer Science.

[23]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.