Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record

OBJECTIVES To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities. MATERIALS AND METHODS Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method. For features, Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) was used to extract standard vocabulary from relevant sections of the unstructured clinical narrative. Temporal features were further extracted to assess the temporal relevance of event mentions with regard to the date of transaminase abnormality. All features were encapsulated in a 3-month-long episode for classification. Results were summarized at patient level in a training set (N=480 patients) and evaluated against a test set (N=120 patients). RESULTS The system achieved positive predictive value (PPV) 0.756, sensitivity 0.919, F1 score 0.829 on the test set, which was significantly better than the best baseline system (PPV 0.590, sensitivity 0.703, F1 score 0.642). Our innovations, which included framing the phenotype problem as an episode-level classification task, and adding temporal information, all proved highly effective. CONCLUSIONS Automated methotrexate-induced liver toxicity phenotype discovery for patients with RA based on structured and unstructured information in the EMR shows accurate results. Our work demonstrates that adding temporal features significantly improved classification results.

[1]  Cui Tao,et al.  Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[2]  Abel N. Kho,et al.  A Highly Specific Algorithm for Identifying Asthma Cases and Controls for Genome-Wide Association Studies , 2009, AMIA.

[3]  Goran Nenadic,et al.  Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[4]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[5]  Peggy L Peissig,et al.  Cataract research using electronic health records , 2011, BMC ophthalmology.

[6]  J. Kremer,et al.  Liver histology in rheumatoid arthritis patients receiving long-term methotrexate therapy. A prospective study with baseline and sequential biopsy samples. , 1989, Arthritis and rheumatism.

[7]  C. Bombardier,et al.  Published Online First , 2007 .

[8]  Chen Lin,et al.  Temporal Annotation in the Clinical Domain , 2014, TACL.

[9]  Chen Lin,et al.  Discovering Temporal Narrative Containers in Clinical Text , 2013, BioNLP@ACL.

[10]  J. Kremer,et al.  Methotrexate for Rheumatoid Arthritis , 1994 .

[11]  Sophia Ananiadou,et al.  BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing , 2012 .

[12]  A Kavanaugh,et al.  Elevated liver enzyme tests among patients with rheumatoid arthritis or psoriatic arthritis treated with methotrexate and/or leflunomide , 2009, Annals of the rheumatic diseases.

[13]  I. Kohane,et al.  Electronic medical records for discovery research in rheumatoid arthritis , 2010, Arthritis care & research.

[14]  J. Kremer,et al.  Significant changes in serum AST across hepatic histological biopsy grades: prospective analysis of 3 cohorts receiving methotrexate therapy for rheumatoid arthritis. , 1996, The Journal of rheumatology.

[15]  J. Kremer,et al.  Methotrexate for rheumatoid arthritis : suggested guidelines for monitoring liver toxicity , 1994 .

[16]  C. Michet,et al.  Risk factors for methotrexate-induced abnormal laboratory monitoring results in patients with rheumatoid arthritis. , 2004, The Journal of rheumatology.

[17]  Peter Szolovits,et al.  Normalization of Plasma 25-Hydroxy Vitamin D Is Associated with Reduced Risk of Surgery in Crohn’s Disease , 2013, Inflammatory bowel diseases.

[18]  Hua Xu,et al.  A hybrid system for temporal information extraction from clinical text , 2013, J. Am. Medical Informatics Assoc..

[19]  I. Kohane,et al.  Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach , 2013, Inflammatory bowel diseases.

[20]  Peter Szolovits,et al.  Similar Risk of Depression and Anxiety Following Surgery or Hospitalization for Crohn's Disease and Ulcerative Colitis , 2013, The American Journal of Gastroenterology.

[21]  D. Roden,et al.  The Emerging Role of Electronic Medical Records in Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[22]  George Hripcsak,et al.  A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[23]  Hua Xu,et al.  Facilitating pharmacogenetic studies using electronic health records and natural-language processing: a case study of warfarin , 2011, J. Am. Medical Informatics Assoc..

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25]  Stephanie W. Haas,et al.  TN-TIES: A System for Extracting Temporal Information from Emergency Department Triage Notes , 2008, AMIA.

[26]  M. Weinblatt,et al.  Long-term prospective trial of low-dose methotrexate in rheumatoid arthritis. , 1988, Arthritis and rheumatism.

[27]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[28]  Stephanie W. Haas,et al.  It's all relative: Usage of relative temporal expressions in triage notes , 2008, ASIST.

[29]  Sanda M. Harabagiu,et al.  A flexible framework for recognizing events, temporal expressions, and temporal relations in clinical text , 2013, J. Am. Medical Informatics Assoc..

[30]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[31]  Cui Tao,et al.  Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification , 2013, J. Am. Medical Informatics Assoc..

[32]  Jin Fan,et al.  Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease , 2010, J. Am. Medical Informatics Assoc..

[33]  Chen Lin,et al.  Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records , 2013, AMIA.

[34]  Natalia Grabar,et al.  Eventual situations for timeline extraction from clinical reports , 2013, J. Am. Medical Informatics Assoc..

[35]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[36]  I. Kohane,et al.  Psychiatric co‐morbidity is associated with increased risk of surgery in Crohn's disease , 2013, Alimentary pharmacology & therapeutics.

[37]  Cynna Selvy,et al.  Unified Medical Language System (UMLS) , 2015 .

[38]  Jun'ichi Tsujii,et al.  An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge , 2013, J. Am. Medical Informatics Assoc..

[39]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.