The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks