Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2

The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.

[1]  Stéphane M. Meystre,et al.  Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes , 2015, J. Biomed. Informatics.

[2]  James Pustejovsky,et al.  A Methodology for Using Professional Knowledge in Corpus , 2013 .

[3]  Özlem Uzuner,et al.  Creation of a new longitudinal corpus of clinical narratives , 2015, J. Biomed. Informatics.

[4]  Xin Liu,et al.  An automatic system to identify heart disease risk factors in clinical texts over time , 2015, J. Biomed. Informatics.

[5]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[6]  Russell V. Lenth,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[7]  Suresh Manandhar,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[8]  Eric Fosler-Lussier,et al.  Comparison of UMLS terminologies to identify risk of heart disease using clinical notes , 2015, J. Biomed. Informatics.

[9]  Manabu Torii,et al.  Risk factor detection for heart disease by applying text analytics in electronic medical records , 2015, J. Biomed. Informatics.

[10]  Betsy B. Dokken,et al.  The Pathophysiology of Cardiovascular Disease and Diabetes: Beyond Blood Pressure and Lipids , 2008 .

[11]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[12]  Shuying Shen,et al.  Evaluating the state of the art in coreference resolution for electronic medical records , 2012, J. Am. Medical Informatics Assoc..

[13]  Özlem Uzuner,et al.  De-identification of Medical Records Through Annotation , 2017 .

[14]  Kalpana Raja,et al.  Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge , 2015, J. Biomed. Informatics.

[15]  Goran Nenadic,et al.  Using local lexicalized rules to identify heart disease risk factors in clinical notes , 2015, J. Biomed. Informatics.

[16]  Halil Kilicoglu,et al.  The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs , 2015, J. Biomed. Informatics.

[17]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[18]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[19]  Jonathan M. Garibaldi,et al.  A hybrid model for automatic identification of risk factors for heart disease , 2015, J. Biomed. Informatics.

[20]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[21]  Özlem Uzuner,et al.  Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus , 2015, J. Biomed. Informatics.

[22]  Chih-Wei Chen,et al.  A context-aware approach for progression tracking of medical concepts in electronic medical records , 2015, J. Biomed. Informatics.

[23]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[24]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[25]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[26]  Eric Fosler-Lussier,et al.  Textual inference for eligibility criteria resolution in clinical trials , 2015, J. Biomed. Informatics.

[27]  Marine Carpuat,et al.  Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017) , 2015 .

[28]  Shuying Shen,et al.  Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents , 2010, J. Am. Medical Informatics Assoc..

[29]  Özlem Uzuner,et al.  Annotating risk factors for heart disease in clinical narratives for diabetic patients , 2015, J. Biomed. Informatics.