Evaluating Superhuman Models with Consistency Checks