Automated Identification of Patients With Pulmonary Nodules in an Integrated Health System Using Administrative Health Plan Data, Radiology Reports, and Natural Language Processing

Introduction: Lung nodules are commonly encountered in clinical practice, yet little is known about their management in community settings. An automated method for identifying patients with lung nodules would greatly facilitate research in this area. Methods: Using members of a large, community-based health plan from 2006 to 2010, we developed a method to identify patients with lung nodules, by combining five diagnostic codes, four procedural codes, and a natural language processing algorithm that performed free text searches of radiology transcripts. An experienced pulmonologist reviewed a random sample of 116 radiology transcripts, providing a reference standard for the natural language processing algorithm. Results: With the use of an automated method, we identified 7112 unique members as having one or more incident lung nodules. The mean age of the patients was 65 years (standard deviation 14 years). There were slightly more women (54%) than men, and Hispanics and non-whites comprised 45% of the lung nodule cohort. Thirty-six percent were never smokers whereas 11% were current smokers. Fourteen percent of the patients were subsequently diagnosed with lung cancer. The sensitivity and specificity of the natural language processing algorithm for identifying the presence of lung nodules were 96% and 86%, respectively, compared with clinician review. Among the true positive transcripts in the validation sample, only 35% were solitary and unaccompanied by one or more associated findings, and 56% measured 8 to 30 mm in diameter. Conclusions: A combination of diagnostic codes, procedural codes, and a natural language processing algorithm for free text searching of radiology reports can accurately and efficiently identify patients with incident lung nodules, many of whom are subsequently diagnosed with lung cancer.

[1]  Shuying Shen,et al.  Application of Natural Language Processing to VA Electronic Health Records to Identify Phenotypic Characteristics for Clinical and Research Purposes , 2008, Summit on translational bioinformatics.

[2]  M. Wahidi,et al.  Evidence for the treatment of patients with pulmonary nodules: when is it lung cancer?: ACCP evidence-based clinical practice guidelines (2nd edition). , 2007, Chest.

[3]  Cezmi A Akdis,et al.  Categorization of allergic disorders in the new World Health Organization International Classification of Diseases , 2014, Clinical and Translational Allergy.

[4]  D. Ost,et al.  Clinical practice. The solitary pulmonary nodule. , 2003, The New England journal of medicine.

[5]  Michael K Gould,et al.  Evidence-Based Clinical Practice Guidelines Nodules : When Is It Lung Cancer ? : ACCP Evaluation of Patients With Pulmonary , 2007 .

[6]  Li Li,et al.  Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study , 2008, AMIA.

[7]  Cynthia Brandt,et al.  A comparison of two approaches to text processing: facilitating chart reviews of radiology reports in electronic medical records. , 2010, Perspectives in health information management.

[8]  Michael K Gould,et al.  Decision making in patients with pulmonary nodules. , 2012, American journal of respiratory and critical care medicine.

[9]  F. Jones,et al.  International Classification of Diseases , 1978 .

[10]  D. Ost,et al.  Solitary Pulmonary Nodule , 2005 .

[11]  George Hripcsak,et al.  Automated detection of adverse events using natural language processing of discharge summaries. , 2005, Journal of the American Medical Informatics Association : JAMIA.

[12]  J. Austin,et al.  Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society. , 2005, Radiology.

[13]  L. Tanoue Evaluation of Patients with Pulmonary Nodules: When is it Lung Cancer?: ACCP Evidence-Based Clinical Practice Guidelines (2nd Edition) , 2009 .