OBJECTIVE
To develop a system for the automatic classification of Cancer Registry notifications data from free-text pathology reports.
METHOD
The underlying technology used for the extraction of cancer notification items is based on the symbolic rule-based classification methodology, whereby formal semantics are used to reason with the systematised nomenclature of medicine - clinical terms (SNOMED CT) concepts identified in the free text. Business rules for cancer notifications used by Cancer Registry coding staff were also incorporated with the aim to mimic Cancer Registry processes.
RESULTS
The system was developed on a corpus of 239 histology and cytology reports (with 60% notifiable reports), and then evaluated on an independent set of 300 reports (with 20% notifiable reports). Results show that the system can reliably classify notifiable reports with 96% and 100% specificity, and achieve an overall accuracy of 82% and 74% for classifying notification items from notifiable reports at a unit record level from the development and evaluation set, respectively.
CONCLUSION
Cancer Registries collect a multitude of data that requires manual review, slowing down the flow of information. Extracting and providing an automatically coded cancer pathology notification for review can lessen the reliance on expert clinical staff, improving the efficiency and availability of cancer information.