Information retrieval using hierarchical dirichlet processes

An information retrieval method is proposed using a hierarchical Dirichlet process as a prior on the parameters of a set of multinomial distributions. The resulting method naturally includes a number of features found in other popular methods. Specifically, tf.idf-like term weighting and document length normalisation are recovered. The new method is compared with Okapi BM-25 [3] and the Twenty-One model [1] on TREC data and is shown to give better performance.