Safety-constrained reinforcement learning with a distributional safety critic