XtremeCLIP: Extremely Parameter-efficient Tuning for Low-resource Vision Language Understanding