Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems