Simulated Annealing in Early Layers Leads to Better Generalization