Extracting Findings from Narrative Reports: Software Transferability and Sources of Physician Disagreement

While natural language processing systems are beginning to see clinical use, it remains unclear whether they can be disseminated effectively through the health care community. MedLEE, a general-purpose natural language processor developed for Columbia-Presbyterian Medical Center, was compared to physicians' ability to detect seven clinical conditions in 200 Brigham and Women's Hospital chest radiograph reports. Using the system on the new institution's reports resulted in a small but measurable drop in performance (it was distinguishable from physicians at p = 0.011). By making adjustments to the interpretation of the processor's coded output (without changing the processor itself), local behavior was better accommodated, and performance improved so that it was indistinguishable from the physicians. Pairs of physicians disagreed on at least one condition for 22% of reports; the source of disagreement appeared to be interpretation of findings, gauging likelihood and degree of disease, and coding errors.