Max-margin clustering: Detecting margins from projections of points on lines

Given a unlabelled set of points X ∊ RN belonging to k groups, we propose a method to identify cluster assignments that provides maximum separating margin among the clusters. We address this problem by exploiting sparsity in data points inherent to margin regions, which a max-margin classifier would produce under a supervised setting to separate points belonging to different groups. By analyzing the projections of X on the set of all possible lines L in RN, we first establish some basic results that are satisfied only by those line intervals lying outside a cluster, under assumptions of linear separability of clusters and absence of outliers. We then encode these results into a pair-wise similarity measure to determine cluster assignments, where we accommodate non-linearly separable clusters using the kernel trick. We validate our method on several UCI datasets and on some computer vision problems, and empirically show its robustness to outliers, and in cases where the exact number of clusters is not available. The proposed approach offers an improvement in clustering accuracy of about 6% on the average, and up to 15% when compared with several existing methods.

[1]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2009, IEEE Trans. Neural Networks.

[2]  Fei Wang,et al.  Unsupervised Maximum Margin Feature Selection with manifold regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Fei Wang,et al.  Linear Time Maximum Margin Clustering , 2010, IEEE Transactions on Neural Networks.

[4]  Jieping Ye,et al.  Discriminative K-means for Clustering , 2007, NIPS.

[5]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[8]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[9]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[11]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[12]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[16]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[19]  David W. Jacobs,et al.  In search of illumination invariants , 2001, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[21]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[22]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[23]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  David G. Stork,et al.  Pattern Classification , 1973 .

[25]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[26]  Jieping Ye,et al.  Adaptive Distance Metric Learning for Clustering , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[28]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[29]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[30]  Rama Chellappa,et al.  Articulation-Invariant Representation of Non-planar Shapes , 2010, ECCV.

[31]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[32]  Dale Schuurmans,et al.  An efficient algorithm for maximal margin clustering , 2012, J. Glob. Optim..

[33]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[34]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.