A Highly Specific Algorithm for Identifying Asthma Cases and Controls for Genome-Wide Association Studies

Our aim was to identify asthmatic patients as cases, and healthy patients as controls, for genome-wide association studies (GWAS), using readily available data from electronic medical records. For GWAS, high specificity is required to accurately identify genotype-phenotype correlations. We developed two algorithms using a combination of diagnoses, medications, and smoking history. By applying stringent criteria for source and specificity of the data we achieved a 95% positive predictive value and 96% negative predictive value for identification of asthma cases and controls compared against clinician review. We achieved a high specificity but at the loss of approximately 24% of the initial number of potential asthma cases we found. However, by standardizing and applying our algorithm across multiple sites, the high number of cases needed for a GWAS could be achieved.

[1]  S. Sullivan,et al.  The health economics of asthma and rhinitis. I. Assessing the economic impact. , 2001, The Journal of allergy and clinical immunology.

[2]  L. Akinbami,et al.  National surveillance for asthma--United States, 1980-2004. , 2007, Morbidity and mortality weekly report. Surveillance summaries.

[3]  D. Mannino,et al.  Surveillance for asthma--United States, 1980-1999. , 2002, Morbidity and mortality weekly report. Surveillance summaries.

[4]  William M Vollmer,et al.  Searching multiple clinical information systems for longer time periods found more prevalent cases of asthma. , 2004, Journal of clinical epidemiology.

[5]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..