Improving proximal policy optimization with alpha divergence