Development of an algorithm to identify metastatic prostate cancer in electronic medical records using natural language processing.

164 Background: Prostate cancer patients who develop metastases are a difficult population to identify through administrative diagnostic codes, due to their protracted time to metastases, limited survival and the inconsistent use of specific codes. As a result, research that is needed to inform the delivery of high-quality care in this setting is limited. Therefore, the goal of this study was to develop an algorithm, which utilizes EMR data to identify men who progress to metastatic prostate cancer after diagnosis using natural language processing (NLP). METHODS An electronic algorithm was developed to search unstructured text using NLP to identify progression to metastases among men with a diagnosis of prostate cancer between 1992 and 2010 in a large, diverse cohort of men who were part of an ongoing study focused on prostate cancer mortality. A training set of 449 men who were diagnosed as early stage prostate cancer was used for development. Pathology, radiology and clinic notes were searched from diagnosis until death or loss to follow-up. Pathology reports were searched for mention of adenocarcinoma in the metastatic lesion, radiology reports were searched for abnormal findings consistent with metastases, and clinic notes were searched for mentions of increasing pain or narcotic use related to metastases. Each NLP component was validated against manual review of the corresponding records. RESULTS Of the 449 men in the training set, 40 (8.9%) were found to have metastatic prostate cancer. The majority of cases had evidence of metastases in their clinic notes (98%). Radiology reports identified 18% of cases, and pathology reports identified 5%. Of the 40 cases identified, 25% did not have a corresponding ICD-9 codes for metastatic cancer. However, 7.5% used ADT, 37.5% had increasing oncology visits and 22.5% had rapidly rising PSA levels. CONCLUSIONS Our results suggest that NLP can be used to identify men with metastatic prostate cancer in the EMR more accurately than diagnosis codes alone. The automated identification of patients with metastatic cancer facilitates quality of care research in this setting to ensure the delivery of appropriate and high-quality care.