Learning the Balance between Exploration and Exploitation via Reward