Vision Transformer for Contrastive Clustering