Robust Evaluation of Topic Models

◮ Despite recent advances in learning and inference algorithms, evaluating the predictive performance of topic models is still painfully slow and unreliable. ◮ We propose a new strategy for computing relative log-likelihood (or perplexity) scores of topic models, based on annealed importance sampling. ◮ The proposed method has smaller Monte Carlo error than previous approaches, leading to marked improvements in both accuracy and computation time.