Policy optimization by lexicographic preference ordering