Stepwise Alignment for Constrained Language Model Policy Optimization