Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning