Towards Adversarial Attack on Vision-Language Pre-training Models