NISTIR 7310 Evaluating Reasoning Systems