Unsupervised Classification with Stochastic Complexity

In unsupervised classification, we are given a collection of samples and must label them to show their class membership, without knowing anything about the underlying data generating machinery, not even the number of classes. That is, we are given some sequence of observed objects 1, 2, … , n, on which we have made a number of measurements X=x1, x2… xn where we have taken k measurements x i =x i1 ,=x i2 …, x ik on each object i. Now we must assign each object to one of a number of classes C = C 1,…, C c in an “optimal” fashion. A solution to this problem, then, consists of (i) a measure of the quality of a given classification, and (ii) an algorithm for classifying a given set of objects.