CBCH (clustering-based convex hull) for reducing training time of support vector machine

Support vector machine (SVM) is an efficient machine learning technique widely applied to various classification problems due to its robustness. However, the training time grows dramatically as the number of training data increases. As a result, the applicability of SVM to large-scale datasets is somewhat limited. In SVM, only a few training samples called support vectors (SVs) affect the construction of hyperplane. Therefore, removing training data irrelevant to the SVs does not degrade the performance of SVM. In this paper the clustering-based convex hull (CBCH) scheme is introduced which allows to efficiently remove insignificant data and thereby reduce the training time of SVM. The CBCH scheme initially applies k-mean clustering algorithm to the given training data points, and then, the convex hull of each cluster is obtained. Only the vertices of the convex hulls and the data points relevant to the SVs are included as training data points. Computer simulation over various sizes and types of datasets reveals that the proposed scheme is considerably faster and more accurate than the existing SVM classifiers. The proposed algorithm is based on geometric interpretation of the SVM and applicable to both linearly separable and linearly inseparable datasets.

[1]  David J. Crisp,et al.  A Geometric Interpretation of v-SVM Classifiers , 1999, NIPS.

[2]  Asdrúbal López Chau,et al.  Convex-Concave Hull for Classification with Support Vector Machine , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[3]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[4]  Jing Liu,et al.  Fast Extended One-Versus-Rest Multi-Label Support Vector Machine Using Approximate Extreme Points , 2017, IEEE Access.

[5]  Jakub Nalepa,et al.  A memetic algorithm to select training data for support vector machines , 2014, GECCO.

[6]  Modjtaba Rouhani,et al.  Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets , 2016 .

[7]  Shuyin Xia,et al.  A method to improve support vector machine based on distance to hyperplane , 2015 .

[8]  Sergios Theodoridis,et al.  A novel SVM Geometric Algorithm based on Reduced Convex Hulls , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Jiawei Han,et al.  Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing , 2005, Data Mining and Knowledge Discovery.

[10]  Asdrúbal López Chau,et al.  Convex and concave hulls for classification with support vector machine , 2013, Neurocomputing.

[11]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[12]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[13]  Kristin P. Bennett,et al.  Duality and Geometry in SVM Classifiers , 2000, ICML.

[14]  Xiaoou Li,et al.  A Novel SVM Classification Method for Large Data Sets , 2010, 2010 IEEE International Conference on Granular Computing.

[15]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[16]  A. V.DavidSánchez,et al.  Advanced support vector machines and kernel methods , 2003, Neurocomputing.

[17]  Yang Liu,et al.  K-SVM: An Effective SVM Algorithm Based on K-means Clustering , 2013, J. Comput..

[18]  Sergios Theodoridis,et al.  A geometric approach to Support Vector Machine (SVM) classification , 2006, IEEE Transactions on Neural Networks.

[19]  Yu Yang,et al.  A fault diagnosis approach for roller bearing based on IMF envelope spectrum and SVM , 2007 .

[20]  Xiaoou Li,et al.  Support Vector Machine Classification Based on Fuzzy Clustering for Large Data Sets , 2006, MICAI.

[21]  Asdrúbal López Chau,et al.  Large data sets classification using convex–concave hull and support vector machine , 2012, Soft Computing.

[22]  Jakub Nalepa,et al.  Towards parameter-less support vector machines , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[23]  Antônio de Pádua Braga,et al.  SVM-KM: speeding SVMs learning with a priori cluster selection and k-means , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[24]  S. Halgamuge,et al.  Reducing the Number of Training Samples for Fast Support Vector Machine Classification , 2004 .

[25]  Anoushiravan Farshidianfar,et al.  Rolling element bearings multi-fault classification based on the wavelet denoising and support vector machine , 2007 .

[26]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[27]  Thomas Serre,et al.  Hierarchical classification and feature reduction for fast face detection with support vector machines , 2003, Pattern Recognit..

[28]  Latifur Khan,et al.  An effective support vector machines (SVMs) performance using hierarchical clustering , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[29]  Madan Gopal,et al.  A comparison study on multiple binary-class SVM methods for unilabel text categorization , 2010, Pattern Recognit. Lett..

[30]  Xin Chen,et al.  Large-scale support vector machine classification with redundant data reduction , 2016, Neurocomputing.

[31]  Jakub Nalepa,et al.  Support Vector Machines Training Data Selection Using a Genetic Algorithm , 2012, SSPR/SPR.

[32]  Jakub Nalepa,et al.  Adaptive memetic algorithm for minimizing distance in the vehicle routing problem with time windows , 2016, Soft Comput..

[33]  Satarupa Banerjee,et al.  Text classification: A least square support vector machine approach , 2007, Appl. Soft Comput..

[34]  Ming Zeng,et al.  Maximum margin classification based on flexible convex hulls , 2015, Neurocomputing.

[35]  Neetesh Purohit,et al.  Detection of Splice Sites Using Support Vector Machine , 2009, IC3.

[36]  S. Theodoridis,et al.  Reduced Convex Hulls: A Geometric Approach to Support Vector Machines [Lecture Notes] , 2007, IEEE Signal Processing Magazine.

[37]  Wei Xu,et al.  A novel relative density based support vector machine , 2016 .

[38]  Xiaoou Li,et al.  Support vector machine classification for large data sets via minimum enclosing ball clustering , 2008, Neurocomputing.

[39]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[40]  Yongzhao Zhan,et al.  Distributed SVM Classification with Redundant Data Removing , 2013, 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing.

[41]  Cunhe Li,et al.  The incremental learning algorithm with support vector machine based on hyperplane-distance , 2011, Applied Intelligence.

[42]  Jian-xiong Dong,et al.  An improved handwritten Chinese character recognition system using support vector machine , 2005, Pattern Recognit. Lett..

[43]  Sungwan Bang,et al.  Weighted Support Vector Machine Using k-Means Clustering , 2014, Commun. Stat. Simul. Comput..

[44]  Jakub Nalepa,et al.  Adaptive Genetic Algorithm to Select Training Data for Support Vector Machines , 2014, EvoApplications.

[45]  Osberth De Castro,et al.  Convex Hull in Feature Space for Support Vector Machines , 2002, IBERAMIA.

[46]  Xindong Wu,et al.  Support vector machines based on K-means clustering for real-time business intelligence systems , 2005, Int. J. Bus. Intell. Data Min..

[47]  Jakub Nalepa,et al.  Selecting training sets for support vector machines: a review , 2018, Artificial Intelligence Review.