Learning to schedule multi-NUMA virtual machines via reinforcement learning