Models of human preference for learning reward functions