Network formation by reinforcement learning: the long and medium run