Challenges of Zero-Shot Recognition with Vision-Language Models: Granularity and Correctness