Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent