Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing

Background:Accurate identification of hepatocellular cancer (HCC) cases from automated data is needed for efficient and valid quality improvement initiatives and research. We validated HCC International Classification of Diseases, 9th Revision (ICD-9) codes, and evaluated whether natural language processing by the Automated Retrieval Console (ARC) for document classification improves HCC identification. Methods:We identified a cohort of patients with ICD-9 codes for HCC during 2005–2010 from Veterans Affairs administrative data. Pathology and radiology reports were reviewed to confirm HCC. The positive predictive value (PPV), sensitivity, and specificity of ICD-9 codes were calculated. A split validation study of pathology and radiology reports was performed to develop and validate ARC algorithms. Reports were manually classified as diagnostic of HCC or not. ARC generated document classification algorithms using the Clinical Text Analysis and Knowledge Extraction System. ARC performance was compared with manual classification. PPV, sensitivity, and specificity of ARC were calculated. Results:A total of 1138 patients with HCC were identified by ICD-9 codes. On the basis of manual review, 773 had HCC. The HCC ICD-9 code algorithm had a PPV of 0.67, sensitivity of 0.95, and specificity of 0.93. For a random subset of 619 patients, we identified 471 pathology reports for 323 patients and 943 radiology reports for 557 patients. The pathology ARC algorithm had PPV of 0.96, sensitivity of 0.96, and specificity of 0.97. The radiology ARC algorithm had PPV of 0.75, sensitivity of 0.94, and specificity of 0.68. Conclusions:A combined approach of ICD-9 codes and natural language processing of pathology and radiology reports improves HCC case identification in automated data.

[1]  Leonard W. D'Avolio,et al.  Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC) , 2010, J. Am. Medical Informatics Assoc..

[2]  J. Bruix,et al.  Management of hepatocellular carcinoma: An update , 2011, Hepatology.

[3]  Chengyi Zheng,et al.  Automated Identification of Patients With Pulmonary Nodules in an Integrated Health System Using Administrative Health Plan Data, Radiology Reports, and Natural Language Processing , 2012, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[4]  H. El‐Serag,et al.  Rising incidence of hepatocellular carcinoma in the United States. , 1999, The New England journal of medicine.

[5]  K. McGlynn,et al.  The Continuing Increase in the Incidence of Hepatocellular Carcinoma in the United States: An Update , 2003, Annals of Internal Medicine.

[6]  Leonard W. D'Avolio,et al.  Statins and prostate cancer diagnosis and grade in a veterans population. , 2011, Journal of the National Cancer Institute.

[7]  T. Koepsell,et al.  US Department of Veterans Affairs medical care system as a resource to epidemiologists. , 2000, American journal of epidemiology.

[8]  Leonard W. D'Avolio,et al.  Automated classification of psychotherapy note text: implications for quality assessment in PTSD care. , 2012, Journal of evaluation in clinical practice.

[9]  S. Jain,et al.  Assessing the Accuracy of Administrative Data in Health Information Systems , 2004, Medical care.

[10]  H. El‐Serag,et al.  Utilization of Surveillance for Hepatocellular Carcinoma Among Hepatitis C Virus–Infected Veterans in the United States , 2011, Annals of Internal Medicine.

[11]  H. El‐Serag,et al.  Utilization of screening for hepatocellular carcinoma in the United States. , 2008, Journal of clinical gastroenterology.

[12]  D. Woodfield Hepatocellular carcinoma. , 1986, The New Zealand medical journal.

[13]  Leonard W. D'Avolio,et al.  Automated Identification of Surveillance Colonoscopy in Inflammatory Bowel Disease Using Natural Language Processing , 2013, Digestive Diseases and Sciences.

[14]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..