ON THE ABILITY OF GRADIENT DESCENT TO LEARN NEURAL NETWORKS