Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models