Compressive Clustering of High-Dimensional Data

In this paper we focus on realistic clustering problems where the input data is high-dimensional and the clusters have complex, multimodal distribution. In this challenging setting the conventional methods, such as k-centers family, hierarchical clustering or those based on model fitting, are inefficient and typically converge far from the globally optimal solution. As an alternative, we propose a novel unsupervised learning approach which is based on the compressive sensing paradigm. The key idea underlying our algorithm is to monitor the distance between the test sample and its principal projection in each cluster, and continue re-assigning it to the cluster yielding the smallest residual. As a result, we obtain an iterative procedure which, under the compressive assumptions, minimizes the total reconstruction error of all samples from their nearest clusters. To evaluate the proposed approach, we have conducted a series of experiments involving various image collections where the task was to automatically group similar objects. Comparison of the obtained results with those yielded by the state-of-the-art clustering methods provides evidence for high discriminative power of our algorithm.

[1]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[3]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[8]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[9]  BouguilaNizar,et al.  A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering , 2009 .

[10]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  G. Cho A NEW PRIMAL-DUAL INTERIOR POINT METHOD FOR LINEAR OPTIMIZATION , 2009 .

[13]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Fatih Murat Porikli,et al.  Connecting the dots in multi-class classification: From nearest subspace to collaborative representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[16]  Nizar Bouguila,et al.  A Hybrid Feature Extraction Selection Approach for High-Dimensional Non-Gaussian Data Clustering , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[18]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[20]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[21]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.