Improving Transformers with Probabilistic Attention Keys