Gaussian kernel width exploration and cone cluster labeling for support vector clustering

The process of clustering groups together data points so that intra-cluster similarity is maximized while inter-cluster similarity is minimized. Support vector clustering (SVC) is a clustering approach that can identify arbitrarily shaped cluster boundaries. The execution time of SVC depends heavily on several factors: choice of the width of a kernel function that determines a nonlinear transformation of the input data, solution of a quadratic program, and the way that the output of the quadratic program is used to produce clusters. This paper builds on our prior SVC research in two ways. First, we propose a method for identifying a kernel width value in a region where our experiments suggest that clustering structure is changing significantly. This can form the starting point for efficient exploration of the space of kernel width values. Second, we offer a technique, called cone cluster labeling, that uses the output of the quadratic program to build clusters in a novel way that avoids an important deficiency present in previous methods. Our experimental results use both two-dimensional and high-dimensional data sets.

[1]  Ickjai Lee,et al.  AMOEBA: HIERARCHICAL CLUSTERING BASED ON SPATIAL PROXIMITY USING DELAUNATY DIAGRAM , 2000 .

[2]  D. Cook,et al.  Graph-based hierarchical conceptual clustering , 2002 .

[3]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[5]  Karen Daniels,et al.  Gaussian Kernel Width Generator for Support Vector Clustering , 2005, Advances in Bioinformatics and Its Applications.

[6]  David Horn,et al.  Clustering via Hilbert space , 2001 .

[7]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[8]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  David Harel,et al.  Clustering spatial data using random walks , 2001, KDD '01.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Michael E. Mortenson,et al.  Geometric Modeling , 2008, Encyclopedia of GIS.

[13]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[14]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[15]  Brian Everitt,et al.  Cluster analysis , 1974 .

[16]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[17]  Ickjai Lee,et al.  AUTOCLUST: Automatic Clustering via Boundary Extraction for Mining Massive Point-Data Sets , 2000 .

[18]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[19]  Karen M. Daniels,et al.  Cone Cluster Labeling for Support Vector Clustering , 2006, SDM.

[20]  Daewon Lee,et al.  An improved cluster labeling method for support vector clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[22]  Karen M. Daniels,et al.  Gaussian Kernel Width Exploration in Support Vector Clustering , 2004 .

[23]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[24]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[25]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[26]  Jian-xiong Dong,et al.  Fast SVM training algorithm with decomposition on very large data sets , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Stephan K. Chalup,et al.  CLUSTERING THROUGH PROXIMITY GRAPH MODELLING , 2002 .

[28]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[29]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[30]  David G. Stork,et al.  Pattern Classification , 1973 .

[31]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[32]  Matthew He,et al.  Advances in Bioinformatics and Its Applications , 2005, Series in mathematical biology and medicine.