Maximum Margin Clustering with Pairwise Constraints

Maximum margin clustering (MMC), which extends the theory of support vector machine to unsupervised learning, has been attracting considerable attention recently. The existing approaches mainly focus on reducing the computational complexity of MMC. The accuracy of these methods, however, has not always been guaranteed. In this paper, we propose to incorporate additional side-information, which is in the form of pairwise constraints, into MMC to further improve its performance. A set of pairwise loss functions are introduced into the clustering objective function which effectively penalize the violation of the given constraints. We show that the resulting optimization problem can be easily solved via constrained concave-convex procedure (CCCP). Moreover, for constrained multi-class MMC, we present an efficient cutting-plane algorithm to solve the sub-problem in each iteration of CCCP. The experiments demonstrate that the pairwise constrained MMC algorithms considerably outperform the unconstrained MMC algorithms and two other clustering algorithms that exploit the same type of side-information.

[1]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[2]  James W. Stoner,et al.  Modeling Damage to Rigid Pavements Caused by Subgrade Pumping , 1996 .

[3]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[4]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[5]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2009, IEEE Trans. Neural Networks.

[6]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[7]  Dale Schuurmans,et al.  Unsupervised and Semi-Supervised Multi-Class Support Vector Machines , 2005, AAAI.

[8]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[9]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[10]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[11]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[12]  Fei Wang,et al.  Efficient Maximum Margin Clustering via Cutting Plane Algorithm , 2008, SDM.

[13]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[14]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[15]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[17]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[18]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[19]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[20]  James T. Kwok,et al.  A regularization framework for multiple-instance learning , 2006, ICML.

[21]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[22]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[23]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[24]  Rong Yan,et al.  A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.