On Average versus Discounted Reward Temporal{diierence Learning 1