论文信息 - k-means Approach to the Karhunen-Loeve Transform

k-means Approach to the Karhunen-Loeve Transform

Abstract—We present a simultaneous generalization of thewell-known Karhunen-Lo´eve (PCA) and k-means algorithms.The basic idea lies in approximating the data with k afﬁnesubspaces of a given dimension n. In the case n = 0 we obtainthe classical k-means, while for k = 1 we obtain PCA algorithm.We show that for some data exploration problems this methodgives better result then either of the classical approaches.Index Terms—Karhunen-Loeve Transform, PCA, k-Means,´optimization, compression, data compression, image compression. I. I NTRODUCTION O UR general problem concerns splitting of a given data-set W into clusters with respect to their intrinsic di-mensionality. The motivation to create such an algorithm is adesire to extract parts of data which can be easily describedby a smaller number of parameters. More precisely, we wantto ﬁnd afﬁne spaces S 1 ;:::;S k such that every element ofW belongs (with certain maximal error) to one of the spacesS 1 ;:::;S k .To explain it graphically, let us consider following example.Figure 1(a) represents three lines in the plane, while Figure1(b) a circle and an orthogonal line in the space. Our goal isto construct an algorithm that will split them into three linesand into a line and a circle.

[1] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[2] E. Forgy,et al. Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[3] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[4] Rolf Klein,et al. Concrete and Abstract Voronoi Diagrams , 1990, Lecture Notes in Computer Science.

[5] Huan Liu,et al. Subspace clustering for high dimensional data: a review , 2004, SKDD.

[6] Sergei Vassilvitskii,et al. k-means++: the advantages of careful seeding , 2007, SODA '07.

[7] Inderjit S. Dhillon,et al. Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[8] Georgios Zervas,et al. The curse of dimensionality and document clustering , 1999 .

[9] Arthur Zimek,et al. Correlation clustering , 2009, SKDD.

[10] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[11] Bin Zhao,et al. Multiple Kernel Clustering , 2009, SDM.

[12] Walter D. Fisher. On Grouping for Maximum Homogeneity , 1958 .

[13] David Salomon,et al. Data Compression: The Complete Reference , 2006 .

[14] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[15] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[16] Francesco Masulli,et al. A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[17] Fionn Murtagh,et al. Overcoming the Curse of Dimensionality in Clustering by Means of the Wavelet Transform , 2000, Comput. J..

[18] Chris H. Q. Ding,et al. K-means clustering via principal component analysis , 2004, ICML.

[19] Daniel A. Keim,et al. Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[20] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.