On the Dynamics and Convergence of Weight Normalization for Training Neural Networks