Image-and-Language Understanding from Pixels Only