A near-optimal polynomial time algorithm for learning in certain classes of stochastic games