An important tool in high-dimensional, explorative data mining is given by clustering methods. They aim at identifying samples or regions of similar characteristics, and often code them by a single codebook vector or centroid. One of the most commonly used partitional clustering techniques is the k-means algorithm, which in its batch form partitions the data set into k disjoint clusters by simply iterating between cluster assignments and cluster updates. The latter step implies calculating a new centroid within each cluster. We generalize the concept of k-means by applying it not to the standard Euclidean space but to the manifold of subvectorspaces of a fixed dimension, also known as the Grassmann manifold. Important examples include projective space i.e. the manifold of lines and the space of all hyperplanes. Detecting clusters in multiple samples drawn from a Grassmannian is a problem arising in various applications. In this manuscript, we provide corresponding metrics for a Grassmann k-means algorithm, and solve the centroid calculation problem explicitly in closed form. An application to nonnegative matrix factorization illustrates the feasibility of the proposed algorithm.
[1]
David P. Dobkin,et al.
The quickhull algorithm for convex hulls
,
1996,
TOMS.
[2]
Sanjoy Dasgupta.
How Fast Is k-Means?
,
2003,
COLT.
[3]
Huan Liu,et al.
Subspace clustering for high dimensional data: a review
,
2004,
SKDD.
[4]
Paul S. Bradley,et al.
k-Plane Clustering
,
2000,
J. Glob. Optim..
[5]
Shokri Z. Selim,et al.
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
,
1984,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6]
Alan Edelman,et al.
The Geometry of Algorithms with Orthogonality Constraints
,
1998,
SIAM J. Matrix Anal. Appl..
[7]
Heekuck Oh,et al.
Neural Networks for Pattern Recognition
,
1993,
Adv. Comput..
[8]
Yoshua Bengio,et al.
Convergence Properties of the K-Means Algorithms
,
1994,
NIPS.