Nearest Neighbor Smoothing of Language Models in IR

We hypothesize the use of one or more nearest neighbors of the document will give better estimates of the probabilities, effectively increasing the sample size. We will treat both Problems 1 and 2 as the same problem, basing our work on a model similar to that of Lavrenko. We will incorporate an average of the probabilities from the k nearest neighbor s. We will include this average using a linear interpolation with the estimate of the proba bility of a term given a document. The interpolation will also include an estimate based on the collection-wide s tatistic .