Support Vector Clustering

We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points. We present a simple algorithm for identifying these clusters. The width of the Gaussian kernel controls the scale at which the data is probed while the soft margin constant helps coping with outliers and overlapping clusters. The structure of a dataset is explored by varying the two parameters, maintaining a minimal number of support vectors to assure smooth cluster boundaries. We demonstrate the performance of our algorithm on several datasets.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  R. Fletcher Practical Methods of Optimization , 1988 .

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[6]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[7]  Nilanjan Ray,et al.  Pattern Recognition Letters , 1995 .

[8]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[9]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[10]  Eytan Domany,et al.  Data Clustering Using a Model Granular Magnet , 1997, Neural Computation.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[13]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[14]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[15]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[16]  Hava T. Siegelmann,et al.  A support vector clustering method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[17]  Hava T. Siegelmann,et al.  Clustering Irregular Shapes Using High-Order Neurons , 2000, Neural Computation.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Naftali Tishby,et al.  Data Clustering by Markovian Relaxation and the Information Bottleneck Method , 2000, NIPS.

[20]  Roded Sharan,et al.  Algorithmic approaches to clustering gene expression data , 2001 .

[21]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[22]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[23]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.