Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis