Evaluation metrics and methodologies for user-centered evaluation of intelligent systems

In the past four years, we have worked with several research programs that were developing intelligent software for use by intelligence analysts. Our involvement in these programs was to develop the metrics and methodologies for assessing the impact on users; in this case, on intelligence analysts. In particular, we focused on metrics to evaluate how much the intelligent systems contribute to the users' tasks and what the cost is to the user in terms of workload and process deviations. In this paper, we describe the approach used. We started with two types of preliminary investigations - first, collecting and analyzing data from analysts working in an instrumented environment for a period of 2 years, and second, developing and conducting formative evaluations of research software. The long-term studies informed our ideas about the processes that analysts use and provided potential metrics in an environment without intelligent software tools. The formative evaluations helped us to define sets of application-specific metrics. Finally, we conducted assessments during and after technology insertions. We describe the metrics and methodologies used in each of these activities, along with the lessons learned.