A general classification rule for probability measures

Abstract : We consider the problem of classifying an unknown probability distribution based on a sequence of random samples drawn according to this distribution. Specifically, if A is a subset of the space of all probability measures M1(sigma) over some compact Polish space E, we want to decide whether or not the unknown distribution belongs to A or its complement. We propose an algorithm which leads a.s. to a correct decision for any A satisfying certain structural assumptions. A refined decision procedure is also presented which, given a countable collection Ai C M1(sigma), i = 1, 2,... each satisfying the structural assumption, will eventually determine a.s. the membership of the distribution in any finite number of the Ai. Applications to density estimation and the problem of order determination of Markov processes are discussed.