Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding