Scalable Safe Policy Improvement via Monte Carlo Tree Search