GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training