Reduction of Training Data Using Parallel Hyperplane for Support Vector Machine

ABSTRACT Support Vector Machine (SVM) is an efficient machine learning technique applicable to various classification problems due to its robustness. However, its time complexity grows dramatically as the number of training data increases, which makes SVM impractical for large-scale datasets. In this paper, a novel Parallel Hyperplane (PH) scheme is introduced which efficiently omits redundant training data with SVM. In the proposed scheme the PHs are recursively formed while the clusters of data points outside the PHs are removed at each repetition. Computer simulation reveals that the proposed scheme greatly reduces the training time compared to the existing clustering-based reduction scheme and SMO scheme, while allowing the accuracy of classification as high as no data reduction scheme.

[1]  Jiawei Han,et al.  Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing , 2005, Data Mining and Knowledge Discovery.

[2]  Modjtaba Rouhani,et al.  Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets , 2016 .

[3]  Neetesh Purohit,et al.  Detection of Splice Sites Using Support Vector Machine , 2009, IC3.

[4]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[5]  Wei Xu,et al.  A novel relative density based support vector machine , 2016 .

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Xiaoou Li,et al.  A Novel SVM Classification Method for Large Data Sets , 2010, 2010 IEEE International Conference on Granular Computing.

[8]  Shuyin Xia,et al.  A method to improve support vector machine based on distance to hyperplane , 2015 .

[9]  Ram Ratan Ahirwal,et al.  A SVM and K-means Clustering based Fast and Efficient Intrusion Detection System , 2013 .

[10]  Xin Chen,et al.  Large-scale support vector machine classification with redundant data reduction , 2016, Neurocomputing.

[11]  Latifur Khan,et al.  An effective support vector machines (SVMs) performance using hierarchical clustering , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[12]  Yongzhao Zhan,et al.  Distributed SVM Classification with Redundant Data Removing , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[13]  Cunhe Li,et al.  The incremental learning algorithm with support vector machine based on hyperplane-distance , 2011, Applied Intelligence.

[14]  Yang Liu,et al.  K-SVM: An Effective SVM Algorithm Based on K-means Clustering , 2013, J. Comput..

[15]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[16]  Jing Liu,et al.  Fast Extended One-Versus-Rest Multi-Label Support Vector Machine Using Approximate Extreme Points , 2017, IEEE Access.

[17]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[18]  Thomas Serre,et al.  Hierarchical classification and feature reduction for fast face detection with support vector machines , 2003, Pattern Recognit..