Assignment methods are at the heart of many algorithms for unsupervised learning and clustering -- in particular, the well-known K-means and Expectation-Maximizatian (EM) algorithms. In this work, we study several different methods of assignment, including the "hard" assignments used by K-means and the "soft" assignments used by EM. While it is known that K-means minimizes the distortion on the data and EM maximizes the likelihood, little is known about the systematic differences of behavior between the two algorithms. Here we shed light on these differences via an information-theoretic analysis. The cornerstone of our results is a simple decomposition of the expected distortion, showing that K-means (and its extension for inferring general parametric densities from unlabeled sample data) must implicitly manage a trade-off between how similar the data assigned to each cluster are, and how the data are balanced among the clusters. How well the data are balanced is measured by the entropy of the partition defined by the hard assignments. In addition to letting us predict and verify systematic differences between K-means and EM on specific examples, the decomposition allows us to give a rather general argument showing that K-means will consistently find densities with less "overlap" than EM. We also study a third natural assignment method that we call posterior assignment, that is close in spirit to the soft assignments of EM, but leads to a surprisingly different algorithm.
[1]
J. MacQueen.
Some methods for classification and analysis of multivariate observations
,
1967
.
[2]
Richard O. Duda,et al.
Pattern classification and scene analysis
,
1974,
A Wiley-Interscience publication.
[3]
Peter E. Hart,et al.
Pattern classification and scene analysis
,
1974,
A Wiley-Interscience publication.
[4]
D. Rubin,et al.
Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper
,
1977
.
[5]
Allen Gersho,et al.
On the structure of vector quantizers
,
1982,
IEEE Trans. Inf. Theory.
[6]
Anders Krogh,et al.
Introduction to the theory of neural computation
,
1994,
The advanced book program.
[7]
Thomas M. Cover,et al.
Elements of Information Theory
,
2005
.
[8]
Biing-Hwang Juang,et al.
Fundamentals of speech recognition
,
1993,
Prentice Hall signal processing series.
[9]
S. Lauritzen.
The EM algorithm for graphical association models with missing data
,
1995
.