Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification

Classifying biased datasets with linearly non-separable features has been a challenge in pattern recognition because traditional classifiers, usually biased and skewed towards the majority class, often produce sub-optimal results. However, if biased or unbalanced data is not processed appropriately, any information extracted from such data risks being compromised. Least Squares Support Vector Machines (LS-SVM) is known for its computational advantage over SVM, however, it suffers from the lack of sparsity of the support vectors: it learns the separating hyper-plane based on the whole dataset and often produces biased hyper-planes with imbalanced datasets. Motivated to contribute a novel approach for the supervised classification of imbalanced datasets, we propose Barricaded Boundary Minority Oversampling (BBMO) that oversamples the minority samples at the boundary in the direction of the closest majority samples to remove LS-SVM’s bias due to data imbalance. Two variations of BBMO are studied: BBMO1 for the linearly separable case which uses the Lagrange multipliers to extract boundary samples from both classes, and the generalized BBMO2 for the non-linear case which uses the kernel matrix to extract the closest majority samples to each minority sample. In either case, BBMO computes the weighted means as new synthetic minority samples and appends them to the dataset. Experiments on different synthetic and real-world datasets show that BBMO with LS-SVM improved on other methods in the literature and motivates follow on research.

[1]  Abdesselam Bouzerdoum,et al.  A training algorithm for sparse LS-SVM using Compressive Sampling , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Mariette Awad,et al.  Minority SVM for linearly separable imbalanced datasets , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[3]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[4]  Peng Li,et al.  Hybrid Kernel Machine Ensemble for Imbalanced Data Sets , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[7]  Alicia Fernández,et al.  Improving Electric Fraud Detection using Class Imbalance Strategies , 2012, ICPRAM.

[8]  Longin Jan Latecki,et al.  Improving SVM Classification on Imbalanced Data Sets in Distance Spaces , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[9]  Nadine Hajj,et al.  Isolated handwriting recognition via multi-stage Support Vector Machines , 2012, 2012 6th IEEE International Conference Intelligent Systems.

[10]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[11]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[12]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[13]  Mariette Awad,et al.  Toward Real-Time Seismic Feature Analysis for Bright Spot Detection: A Distributed Approach , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[15]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[16]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[17]  Albert Y. Zomaya,et al.  A particle swarm based hybrid system for imbalanced medical data sampling , 2009, BMC Genomics.

[18]  Ling Zhuang,et al.  Parameter Optimization of Kernel-based One-class Classifier on Imbalance Learning , 2006, J. Comput..

[19]  Rahul Khanna,et al.  Efficient Learning Machines , 2015, Apress.

[20]  Hiroyuki Yoshida,et al.  A Clinical Decision Support Framework for Incremental Polyps Classification in Virtual Colonoscopy , 2010, Algorithms.

[21]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[22]  Mariette Awad,et al.  KerMinSVM for imbalanced datasets with a case study on arabic comics classification , 2017, Eng. Appl. Artif. Intell..

[23]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[24]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[25]  Stan Matwin,et al.  Cost-Sensitive Boosting Algorithms for Imbalanced Multi-instance Datasets , 2013, Canadian Conference on AI.

[26]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[27]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[28]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[29]  Yen-Jen Oyang,et al.  A Study of Supervised Learning with Multivariate Analysis on Unbalanced Datasets , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[30]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[31]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[32]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Mariette Awad,et al.  An ordinal kernel trick for a computationally efficient support vector machine , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[34]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[35]  Adam Kowalczyk,et al.  One class SVM for yeast regulation prediction , 2002, SKDD.

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[37]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[38]  Mariette Awad,et al.  Ham or spam? A comparative study for some content-based classification algorithms for email filtering , 2014, MELECON 2014 - 2014 17th IEEE Mediterranean Electrotechnical Conference.