SP-ViT: Learning 2D Spatial Priors for Vision Transformers