Learning strict Nash equilibria through reinforcement