Combining Model-based and Model-free RL via Multi-step Control Variates