Mechanics of Next Token Prediction with Self-Attention