When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations