Fast Multiplier Methods to Optimize Non-exhaustive, Overlapping Clustering

Clustering is one of the most fundamental and important tasks in data mining. Traditional clustering algorithms, such as K-means, assign every data point to exactly one cluster. However, in real-world datasets, the clusters may overlap with each other. Furthermore, often, there are outliers that should not belong to any cluster. We recently proposed the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective as a way to address both issues in an integrated fashion. Optimizing this discrete objective is NP-hard, and even though there is a convex relaxation of the objective, straightforward convex optimization approaches are too expensive for large datasets. A practical alternative is to use a low-rank factorization of the solution matrix in the convex formulation. The resulting optimization problem is non-convex, and we can locally optimize the objective function using an augmented Lagrangian method. In this paper, we consider two fast multiplier methods to accelerate the convergence of an augmented Lagrangian scheme: a proximal method of multipliers and an alternating direction method of multipliers (ADMM). For the proximal augmented Lagrangian or proximal method of multipliers, we show a convergence result for the non-convex case with bound-constrained subproblems. These methods are up to 13 times faster---with no change in quality---compared with a standard augmented Lagrangian method on problems with over 10,000 variables and bring runtimes down from over an hour to around 5 minutes.

[1]  Guillaume Cleuziou,et al.  An extended version of the k-means method for overlapping clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[2]  Stephen M. Robinson,et al.  Strongly Regular Generalized Equations , 1980, Math. Oper. Res..

[3]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[4]  Michael P. Friedlander,et al.  A primal–dual regularized interior-point method for convex quadratic programs , 2010, Mathematical Programming Computation.

[5]  Carlo Fischione,et al.  On the Convergence of Alternating Direction Lagrangian Methods for Nonconvex Structured Optimization Problems , 2014, IEEE Transactions on Control of Network Systems.

[6]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping k-means , 2015, SDM.

[7]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[8]  C. Gerhardt,et al.  Carlos , 2011 .

[9]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[10]  Teemu Pennanen,et al.  Local Convergence of the Proximal Point Algorithm and Multiplier Methods Without Monotonicity , 2002, Math. Oper. Res..

[11]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.

[12]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[13]  Inderjit S. Dhillon,et al.  Non-exhaustive, Overlapping Clustering via Low-Rank Semidefinite Programming , 2015, KDD.

[14]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[15]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[16]  Joydeep Ghosh,et al.  Model-based overlapping clustering , 2005, KDD '05.

[17]  Paulo J. S. Silva,et al.  Some Inexact Hybrid Proximal Augmented Lagrangian Algorithms , 2004, Numerical Algorithms.

[18]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[19]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[20]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[21]  R. Tyrrell Rockafellar,et al.  Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming , 1976, Math. Oper. Res..

[22]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[23]  Alfredo N. Iusem,et al.  Inexact Variants of the Proximal Point Algorithm without Monotonicity , 2002, SIAM J. Optim..

[24]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[25]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..