A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck