Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin

OBJECTIVE DNA biobanks linked to comprehensive electronic health records systems are potentially powerful resources for pharmacogenetic studies. This study sought to develop natural-language-processing algorithms to extract drug-dose information from clinical text, and to assess the capabilities of such tools to automate the data-extraction process for pharmacogenetic studies. MATERIALS AND METHODS A manually validated warfarin pharmacogenetic study identified a cohort of 1125 patients with a stable warfarin dose, in which 776 patients were managed by Coumadin Clinic physicians, and the remaining 349 patients were managed by their providers. The authors developed two algorithms to extract weekly warfarin doses from both data sets: a regular expression-based program for semistructured Coumadin Clinic notes; and an advanced weekly dose calculator based on an existing medication information extraction system (MedEx) for narrative providers' notes. The authors then conducted an association analysis between an automatically extracted stable weekly dose of warfarin and four genetic variants of VKORC1 and CYP2C9 genes. The performance of the weekly dose-extraction program was evaluated by comparing it with a gold standard containing manually curated weekly doses. Precision, recall, F-measure, and overall accuracy were reported. Associations between known variants in VKORC1 and CYP2C9 and warfarin stable weekly dose were performed with linear regression adjusted for age, gender, and body mass index. RESULTS The authors' evaluation showed that the MedEx-based system could determine patients' warfarin weekly doses with 99.7% recall, 90.8% precision, and 93.8% accuracy. Using the automatically extracted weekly doses of warfarin, the authors successfully replicated the previous known associations between warfarin stable dose and genetic variants in VKORC1 and CYP2C9.

[1]  M. Rieder,et al.  Use of Pharmacogenetic and Clinical Factors to Predict the Therapeutic Dose of Warfarin , 2008, Clinical pharmacology and therapeutics.

[2]  Shuying Shen,et al.  Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents , 2010, J. Am. Medical Informatics Assoc..

[3]  R. Beyth,et al.  A Multicomponent Intervention To Prevent Major Bleeding Complications in Older Patients Receiving Warfarin , 2000, Annals of Internal Medicine.

[4]  Christopher G. Chute,et al.  A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record , 2010, PloS one.

[5]  Soma Das,et al.  Genetic variants in the UDP-glucuronosyltransferase 1A1 gene predict the risk of severe neutropenia of irinotecan. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  Hong Yu,et al.  Lancet: a high precision medication event extraction system for clinical text , 2010, J. Am. Medical Informatics Assoc..

[7]  Y. Caraco,et al.  CYP2C9 Genotype‐guided Warfarin Prescribing Enhances the Efficacy and Safety of Anticoagulation: A Prospective Randomized Controlled Study , 2008, Clinical pharmacology and therapeutics.

[8]  Dana C Crawford,et al.  Identifying the genotype behind the phenotype: a role model found in VKORC1 and its association with warfarin dosing. , 2007, Pharmacogenomics.

[9]  Natalia Grabar,et al.  Linguistic approach for identification of medication names and related information in clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[10]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[11]  Goran Nenadic,et al.  Medication information extraction with linguistic pattern matching and semantic rules , 2010, J. Am. Medical Informatics Assoc..

[12]  Richard L Berg,et al.  Construction of atorvastatin dose-response relationships using data from a large population-based DNA biobank. , 2007, Basic & clinical pharmacology & toxicology.

[13]  George Hripcsak,et al.  Accelerating the use of electronic health records in physician practices. , 2010, The New England journal of medicine.

[14]  Sowmya R. Rao,et al.  Use of electronic health records in U.S. hospitals. , 2009, The New England journal of medicine.

[15]  Son Doan,et al.  Integrating existing natural language processing tools for medication extraction from discharge summaries , 2010, J. Am. Medical Informatics Assoc..

[16]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[17]  Russ B. Altman,et al.  Extending and evaluating a warfarin dosing algorithm that includes CYP4F2 and pooled rare variants of CYP2C9 , 2010, Pharmacogenetics and genomics.

[18]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[19]  R. Gearry,et al.  Azathioprine and 6‐mercaptopurine pharmacogenetics and metabolite monitoring in inflammatory bowel disease , 2005, Journal of gastroenterology and hepatology.

[20]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[21]  Russell A Wilke,et al.  Biobanking and pharmacogenomics. , 2010, Pharmacogenomics.

[22]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[23]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[24]  Marylyn D. Ritchie,et al.  Abstract 19509: Identifying Genotype-Phenotype Relations in Electronic Medical Record Systems: Application to Warfarin Pharmacogenomics. , 2010 .

[25]  E. Antman,et al.  Cytochrome p-450 polymorphisms and response to clopidogrel. , 2009, The New England journal of medicine.

[26]  Deborah A Nickerson,et al.  Effect of VKORC1 haplotypes on transcriptional regulation and warfarin dose. , 2005, The New England journal of medicine.

[27]  D. Roden,et al.  The Emerging Role of Electronic Medical Records in Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[28]  Melissa A. Basford,et al.  Identification of Genomic Predictors of Atrioventricular Conduction: Using Electronic Medical Records as a Tool for Genome Science , 2010, Circulation.

[29]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.