Towards Calibrated Robust Fine-Tuning of Vision-Language Models