Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty