Large-Scale High-Dimensional Clustering with Fast Sketching

In this paper, we address the problem of high-dimensional k-means clustering in a large-scale setting, i.e. for datasets that comprise a large number of items. Sketching techniques have already been used to deal with this “large-scale” issue, by compressing the whole dataset into a single vector of random nonlinear generalized moments from which the $k$ centroids are then retrieved efficiently. However, this approach usually scales quadratically with the dimension; to cope with high-dimensional datasets, we show how to use fast structured random matrices to compute the sketching operator efficiently. This yields significant speed-ups and memory savings for high-dimensional data, while the clustering results are shown to be much more stable, both on artificial and real datasets.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[3]  Benjamin Recht,et al.  The alternating descent conditional gradient method for sparse inverse problems , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[4]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[5]  Rémi Gribonval,et al.  Compressive K-means , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[7]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[8]  Rémi Gribonval,et al.  LocOMP: algorithme localement orthogonal pour l'approximation parcimonieuse rapide de signaux longs sur des dictionnaires locaux , 2009 .

[9]  Christos Boutsidis,et al.  Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[10]  Bernard Chazelle,et al.  The Fast Johnson--Lindenstrauss Transform and Approximate Nearest Neighbors , 2009, SIAM J. Comput..

[11]  Rémi Gribonval,et al.  Compressive Statistical Learning with Random Feature Moments , 2017, Mathematical Statistics and Learning.

[12]  LeskovecJure,et al.  Defining and evaluating network communities based on ground-truth , 2015 .

[13]  Jirí Matousek,et al.  Low-Distortion Embeddings of Finite Metric Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[17]  Pierre Vandergheynst,et al.  Compressive Spectral Clustering , 2016, ICML.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Christos Boutsidis,et al.  Random Projections for $k$-means Clustering , 2010, NIPS.

[20]  Christopher R. Taber,et al.  Generalized Method of Moments , 2020, Time Series Analysis.

[21]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[22]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[23]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[24]  Markus Püschel,et al.  In search of the optimal Walsh-Hadamard transform , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[25]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[26]  Andrew R. Teel,et al.  ESAIM: Control, Optimisation and Calculus of Variations , 2022 .

[27]  Luc Van Gool,et al.  F2F: A Library For Fast Kernel Expansions , 2017, ArXiv.

[28]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[29]  Anne Morvan,et al.  Structured adaptive and random spinners for fast machine learning computations , 2016, AISTATS.

[30]  Krzysztof Choromanski,et al.  Recycling Randomness with Structure for Sublinear time Kernel Expansions , 2016, ICML.

[31]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[32]  K. Bredies,et al.  Inverse problems in spaces of measures , 2013 .

[33]  Patrick Pérez,et al.  Sketching for Large-Scale Learning of Mixture Models. (Apprentissage de modèles de mélange à large échelle par Sketching) , 2017 .

[34]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.