CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization