Vicinal support vector classifier using supervised kernel-based clustering

OBJECTIVE Support vector machines (SVMs) have drawn considerable attention due to their high generalisation ability and superior classification performance compared to other pattern recognition algorithms. However, the assumption that the learning data is identically generated from unknown probability distributions may limit the application of SVMs for real problems. In this paper, we propose a vicinal support vector classifier (VSVC) which is shown to be able to effectively handle practical applications where the learning data may originate from different probability distributions. METHODS The proposed VSVC method utilises a set of new vicinal kernel functions which are constructed based on supervised clustering in the kernel-induced feature space. Our proposed approach comprises two steps. In the clustering step, a supervised kernel-based deterministic annealing (SKDA) clustering algorithm is employed to partition the training data into different soft vicinal areas of the feature space in order to construct the vicinal kernel functions. In the training step, the SVM technique is used to minimise the vicinal risk function under the constraints of the vicinal areas defined in the SKDA clustering step. RESULTS Experimental results on both artificial and real medical datasets show our proposed VSVC achieves better classification accuracy and lower computational time compared to a standard SVM. For an artificial dataset constructed from non-separated data, the classification accuracy of VSVC is between 95.5% and 96.25% (using different cluster numbers) which compares favourably to the 94.5% achieved by SVM. The VSVC training time is between 8.75s and 17.83s (for 2-8 clusters), considerable less than the 65.0s required by SVM. On a real mammography dataset, the best classification accuracy of VSVC is 85.7% and thus clearly outperforms a standard SVM which obtains an accuracy of only 82.1%. A similar performance improvement is confirmed on two further real datasets, a breast cancer dataset (74.01% vs. 72.52%) and a heart dataset (84.77% vs. 83.81%), coupled with a reduction in terms of learning time (32.07s vs. 92.08s and 25.00s vs. 53.31s, respectively). Furthermore, the VSVC results in the number of support vectors being equal to the specified cluster number, and hence in a much sparser solution compared to a standard SVM. CONCLUSION Incorporating a supervised clustering algorithm into the SVM technique leads to a sparse but effective solution, while making the proposed VSVC adaptive to different probability distributions of the training data.

[1]  Francesco Camastra,et al.  A Novel Kernel Method for Clustering , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[3]  Jacek M. Leski,et al.  Fuzzy c-varieties/elliptotypes clustering in reproducing kernel Hilbert space , 2004, Fuzzy Sets Syst..

[4]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[5]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[6]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[7]  Ian W. Ricketts,et al.  The Mammographic Image Analysis Society digital mammogram database , 1994 .

[8]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  Daniil Ryabko,et al.  Pattern Recognition for Conditionally Independent Data , 2005, J. Mach. Learn. Res..

[11]  J. Chiang,et al.  A new kernel-based fuzzy clustering approach: support vector clustering with cell growing , 2003, IEEE Trans. Fuzzy Syst..

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[14]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[17]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[18]  Sanjeev R. Kulkarni,et al.  An Elementary Introduction to Statistical Learning Theory , 2011 .

[19]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[20]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[21]  J. Łȩski Fuzzy c-varieties/elliptotypes clustering in reproducing kernel Hilbert space , 2004 .

[22]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[23]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[24]  Don R. Hush,et al.  Learning from dependent observations , 2007, J. Multivar. Anal..

[25]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[26]  Sheng Liu,et al.  Mammographic mass detection by vicinal support vector machine , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[27]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[28]  Anil K. Jain,et al.  Texture Analysis , 2018, Handbook of Image Processing and Computer Vision.

[29]  Qing Song,et al.  Kernel-based deterministic annealing algorithm for data clustering , 2006 .

[30]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[31]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.