What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?