Editorial Comments: Diagnostic Decision Support Systems: How to Determine the Gold Standard?

In 1996 in an editorial on evaluation of decision support systems, Miller proposed that the bottom line in evaluating clinical decision support systems (CDSSs) should be “whether the user plus the system is better than the unaided user with respect to a specified task….”1 Since 1996, several studies have examined that issue, and, yet, there is still disagreement on the way to operationalize Miller's proposition. In this issue of the Journal , Ramnarayan et al.2 describe a variety of metrics to evaluate the performance of a new pediatric diagnostic program, ISABEL. In a previous issue, Fraser et al.3 also described metrics to evaluate a heart disease program, the HDP. Both Ramnarayan et al. and Fraser et al. discussed how their measures compared with the earlier measures used by Berner et al.4 and Friedman et al.5 to evaluate other diagnostic programs. Why should it be so difficult to agree on a reasonable metric for evaluating these systems? Those of us who have struggled with this issue in our research have come to appreciate some of the difficulties that may not be immediately obvious in the published literature, but are important to articulate. Many of these issues are not unique to the diagnostic programs, but are a challenge in evaluating any CDSS. However, diagnostic programs are particularly challenging because, as Ramnarayan et al. indicate, diagnostic programs should influence both the diagnosis and the management plans. With that in mind, and with Miller's injunction to focus on evaluating how the system and clinician work together, I would like to discuss the problems that arise with the different “gold standards” that researchers have used and also would like to offer suggestions for researchers and developers of diagnostic CDSS. Most researchers have included in their metrics the production of the …

[1]  William J. Long,et al.  Research Paper: Evaluation of a Cardiac Diagnostic Program in a Typical Clinical Setting , 2003, J. Am. Medical Informatics Assoc..

[2]  J P Kassirer,et al.  A report card on computer-assisted diagnosis--the grade: C. , 1994, The New England journal of medicine.

[3]  Fredric M. Wolf,et al.  Effects of a decision support system on the diagnostic accuracy of users: a preliminary report. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[4]  Paul M. Taylor,et al.  Research Paper: Measuring the Impact of Diagnostic Decision Support on the Quality of Clinical Decision Making: Development of a Reliable and Valid Composite Score , 2003, J. Am. Medical Informatics Assoc..

[5]  Douglas B. Fridsma,et al.  Research Paper: Computer Decision Support as a Source of Interpretation Error: The Case of Electrocardiograms , 2003, J. Am. Medical Informatics Assoc..

[6]  A. L. Baker,et al.  Performance of four computer-based diagnostic systems. , 1994, The New England journal of medicine.

[7]  Andrea Everard,et al.  Cognitive fit and an intelligent agent for a word processor: should users take all that advice? , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[8]  Eta S. Berner,et al.  Clinician Performance and Prominence of Diagnoses Displayed by a Clinical Diagnostic Decision Support System , 2003, AMIA.

[9]  Randolph A. Miller,et al.  Evaluating Evaluations of Medical Diagnostic Systems , 1996, J. Am. Medical Informatics Assoc..

[10]  Michael D. Miller,et al.  The Impact of a Decision Support System on Physician Work-up Strategies , 2000, AMIA.