Benefits from Variational Regularization in Language Models