A Bayesian Interpretation of Interpolated Kneser-Ney

Interpolated Kneser-Ney is one of the best smoothing method s for n-gram language models. Previous explanations for its superiority have been based on intu itive and empirical justifications of specific properties of the method. We propose a novel interpretation of i terpolated Kneser-Ney as approximate inference in a hierarchical Bayesian model consisting of Pitman-Yor processes. As opposed to past explanations, our interpretation can recover exactly the formulation of interpolated Kneser-Ney, and performs better than interpolated Kneser-Ney when a better inf rence procedure is used.