A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics