Training trajectories, mini-batch losses and the curious role of the learning rate