How well do contrastively trained models transfer?