Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training