Accurate language model estimation with document expansion