论文信息 - Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering

Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k /ε) / ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.

[1] Dimitris Achlioptas,et al. Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[2] Tamás Sarlós,et al. Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[3] Anirban Dasgupta,et al. A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[4] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[6] Noga Alon,et al. Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[7] Rachel Ward,et al. New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[8] Nir Ailon,et al. Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[9] Kasper Green Larsen,et al. Optimality of the Johnson-Lindenstrauss Lemma , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[10] M. D. Kirszbraun. Über die zusammenziehende und Lipschitzsche Transformationen , 1934 .

[11] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[12] Nariman Farvardin,et al. A study of vector quantization for noisy channels , 1990, IEEE Trans. Inf. Theory.

[13] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14] Christos Boutsidis,et al. Random Projections for $k$-means Clustering , 2010, NIPS.

[15] Dan Feldman,et al. Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[16] Christos Boutsidis,et al. Randomized Dimensionality Reduction for $k$ -Means Clustering , 2011, IEEE Transactions on Information Theory.

[17] Bernard Chazelle,et al. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[18] Alan M. Frieze,et al. Clustering in large graphs and matrices , 1999, SODA '99.

[19] Fabrizio Grandoni,et al. Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma , 2019, STOC.

[20] Assaf Naor,et al. Metric dimension reduction: A snapshot of the Ribe program , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[21] Daniel M. Kane,et al. Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[22] Michael B. Cohen,et al. Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[23] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Christos Boutsidis,et al. Unsupervised Feature Selection for the $k$-means Clustering Problem , 2009, NIPS.

[25] Sanjoy Dasgupta,et al. An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[26] M. Sion. On general minimax theorems , 1958 .

[27] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[28] Nir Ailon,et al. An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[29] Christos Boutsidis,et al. Deterministic Feature Selection for K-Means Clustering , 2011, IEEE Transactions on Information Theory.

[30] M. Talagrand,et al. Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[31] David P. Woodruff,et al. Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[32] Roman Vershynin,et al. High-Dimensional Probability , 2018 .

[33] David P. Woodruff,et al. Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[34] Mary Wootters,et al. New constructions of RIP matrices with fast multiplication and fewer rows , 2012, SODA.

[35] S. Mendelson,et al. Empirical processes and random projections , 2005 .