A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity