Robust Convex Clustering Analysis

Clustering is an unsupervised learning approach that explores data and seeks groups of similar objects. Many classical clustering models such as k-means and DBSCAN are based on heuristics algorithms and suffer from local optimal solutions and numerical instability. Recently convex clustering has received increasing attentions, which leverages the sparsity inducing norms and enjoys many attractive theoretical properties. However, convex clustering is based on Euclidean distance and is thus not robust against outlier features. Since the outlier features are very common especially when dimensionality is high, the vulnerability has greatly limited the applicability of convex clustering to analyze many real-world datasets. In this paper, we address the challenge by proposing a novel robust convex clustering method that simultaneously performs convex clustering and identifies outlier features. Specifically, the proposed method learns to decompose the data matrix into a clustering structure component and a group sparse component that captures feature outliers. We develop a block coordinate descent algorithm which iteratively performs convex clustering after outliers features are identified and eliminated. We also propose an efficient algorithm for solving the convex clustering by exploiting the structures on its dual problem. Moreover, to further illustrate the statistical stability, we present the theoretical performance bound of the proposed clustering method. Empirical studies on synthetic data and real-world data demonstrate that the proposed robust convex clustering can detect feature outliers as well as improve cluster quality.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  P. Arabie,et al.  An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling , 1975 .

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[7]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[8]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[9]  Tzong-Jer Chen,et al.  Fuzzy c-means clustering with spatial information for image segmentation , 2006, Comput. Medical Imaging Graph..

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[12]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[13]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[14]  L. Ljung,et al.  Clustering using sum-of-norms regularization: With application to particle filter output computation , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[15]  Francis R. Bach,et al.  Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties , 2011, ICML.

[16]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[17]  Kean Ming Tan,et al.  Statistical properties of convex clustering. , 2015, Electronic journal of statistics.

[18]  Gary K. Chen,et al.  Convex Clustering: An Attractive Alternative to Hierarchical Clustering , 2014, PLoS Comput. Biol..

[19]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[20]  Lei Han,et al.  Reduction Techniques for Graph-Based Convex Clustering , 2016, AAAI.