Reference Standards in Evaluating System Performance

The paper in this issue by Hripcsak and Wilcox, “Reference Standards, Judges, and Comparison Subjects: Roles for Experts in Evaluating System Performance,”1 is well written and presents a thoughtful analysis of the topic. As the authors acknowledge, however, there is more to the evaluation of clinical informatics systems than can be accomplished through comparison to experts.2,3 Hripcsak and Wilcox focus on “how to use experts in evaluating systems when one needs them,” whereas this commentary focuses on the question, “when should one use experts as part of a system's evaluation” The two perspectives are complementary rather than contradictory. As noted previously,4 System evaluation in biomedical informatics should take place as an ongoing, strategically planned process, not as a single event or small number of episodes. Complex software systems and accepted medical practices both evolve rapidly, so evaluators and readers of evaluations face moving targets. … [C]urrent thinking recognizes that such systems are of value only when they help users to solve users' problems. Users, not systems, characterize and solve clinical diagnostic problems. The ultimate unit of evaluation should be whether the user plus the system is better than the unaided user with respect to a specified task or problem.… If the ultimate evaluation of a system depends on whether users of the system perform a specified task better when they use the system than when they don't, …