Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization