论文信息 - Isotropic PCA and Affine-Invariant Clustering - 字舞流文

Isotropic PCA and Affine-Invariant Clustering

We present an extension of principal component analysis (PCA) and a new algorithm for clustering points in \Rn based on it. The key property of the algorithm is that it is affine-invariant. When the input is a sample from a mixture of two arbitrary Gaussians, the algorithm correctly classifies the sample assuming only that the two components are separable by a hyperplane, i.e., there exists a halfspace that contains most of one Gaussian and almost none of the other in probability mass. This is nearly the best possible, improving known results substantially. For k>2 components, the algorithm requires only that there be some (k-1)-dimensional subspace in which the ``overlap'' in every direction is small. Our main tools are isotropic transformation, spectral projection and a simple reweighting technique. We call this combination isotropic PCA.

Santosh S. Vempala | S. Charles Brubaker | S. Vempala | S. Brubaker | Charles Brubaker

[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[2] David G. Stork,et al. Pattern Classification , 1973 .

[3] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4] G. Stewart,et al. Matrix Perturbation Theory , 1990 .

[5] M. Rudelson. Random Vectors in the Isotropic Position , 1996, math/9608208.

[6] Sanjoy Dasgupta,et al. Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7] Sanjoy Dasgupta,et al. A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.

[8] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[9] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[10] Santosh S. Vempala,et al. A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[11] Sanjeev Arora,et al. LEARNING MIXTURES OF SEPARATED NONSPHERICAL GAUSSIANS , 2005, math/0503457.

[12] Dimitris Achlioptas,et al. On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[13] Jon M. Kleinberg,et al. On learning mixtures of heavy-tailed distributions , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[14] Pavel Pudil,et al. Introduction to Statistical Pattern Recognition , 2006 .

[15] Jon Feldman,et al. PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption , 2006, COLT.

[16] Mark Rudelson,et al. Sampling from large matrices: An approach through geometric functional analysis , 2005, JACM.

[17] Santosh S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[18] S. Vempala,et al. The geometry of logconcave functions and sampling algorithms , 2007 .

[19] Santosh S. Vempala,et al. The Spectral Method for General Mixture Models , 2008, SIAM J. Comput..

[20] Satish Rao,et al. Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions , 2008, COLT.

[21] Satish Rao,et al. Learning Mixtures of Product Distributions Using Correlations and Independence , 2008, COLT.