Bregman Gradient Policy Optimization