Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy