Improving Policy Generalization for Teacher-Student Reinforcement Learning