OBJECTIVE
To develop a system for the automatic classification of pathology reports for Cancer Registry notifications.
METHOD
A two pass approach is proposed to classify whether pathology reports are cancer notifiable or not. The first pass queries pathology HL7 messages for known report types that are received by the Queensland Cancer Registry (QCR), while the second pass aims to analyse the free text reports and identify those that are cancer notifiable. Cancer Registry business rules, natural language processing and symbolic reasoning using the SNOMED CT ontology were adopted in the system.
RESULTS
The system was developed on a corpus of 500 histology and cytology reports (with 47% notifiable reports) and evaluated on an independent set of 479 reports (with 52% notifiable reports). RESULTS show that the system can reliably classify cancer notifiable reports with a sensitivity, specificity, and positive predicted value (PPV) of 0.99, 0.95, and 0.95, respectively for the development set, and 0.98, 0.96, and 0.96 for the evaluation set. High sensitivity can be achieved at a slight expense in specificity and PPV.
CONCLUSION
The system demonstrates how medical free-text processing enables the classification of cancer notifiable pathology reports with high reliability for potential use by Cancer Registries and pathology laboratories.
[1]
Anthony N. Nguyen,et al.
Symbolic rule-based classification of lung cancer stages from free-text pathology reports
,
2010,
J. Am. Medical Informatics Assoc..
[2]
Anthony N. Nguyen,et al.
Automatic Extraction of Cancer Characteristics from Free-Text Pathology Reports for Cancer Notifications
,
2011,
HIC.
[3]
A M Chinnaiyan,et al.
The Registry Case Finding Engine (CaFE): An automated approach for cancer patient identification from unstructured, free-text pathology reports.
,
2006,
Journal of clinical oncology : official journal of the American Society of Clinical Oncology.
[4]
Paolo Crosignani,et al.
Comparison with manual registration reveals satisfactory completeness and efficiency of a computerized cancer registration system
,
2008,
J. Biomed. Informatics.
[5]
Leonard W. D'Avolio,et al.
Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC)
,
2010,
J. Am. Medical Informatics Assoc..