Multilayered temporal modeling for the clinical domain

OBJECTIVE To develop an open-source temporal relation discovery system for the clinical domain. The system is capable of automatically inferring temporal relations between events and time expressions using a multilayered modeling strategy. It can operate at different levels of granularity--from rough temporality expressed as event relations to the document creation time (DCT) to temporal containment to fine-grained classic Allen-style relations. MATERIALS AND METHODS We evaluated our systems on 2 clinical corpora. One is a subset of the Temporal Histories of Your Medical Events (THYME) corpus, which was used in SemEval 2015 Task 6: Clinical TempEval. The other is the 2012 Informatics for Integrating Biology and the Bedside (i2b2) challenge corpus. We designed multiple supervised machine learning models to compute the DCT relation and within-sentence temporal relations. For the i2b2 data, we also developed models and rule-based methods to recognize cross-sentence temporal relations. We used the official evaluation scripts of both challenges to make our results comparable with results of other participating systems. In addition, we conducted a feature ablation study to find out the contribution of various features to the system's performance. RESULTS Our system achieved state-of-the-art performance on the Clinical TempEval corpus and was on par with the best systems on the i2b2 2012 corpus. Particularly, on the Clinical TempEval corpus, our system established a new F1 score benchmark, statistically significant as compared to the baseline and the best participating system. CONCLUSION Presented here is the first open-source clinical temporal relation discovery system. It was built using a multilayered temporal modeling strategy and achieved top performance in 2 major shared tasks.

[1]  Hua Xu,et al.  A hybrid system for temporal information extraction from clinical text , 2013, J. Am. Medical Informatics Assoc..

[2]  Chen Lin,et al.  Temporal Annotation in the Clinical Domain , 2014, TACL.

[3]  Rada Mihalcea,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Langu , 2011, ACL 2011.

[4]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[5]  James Pustejovsky,et al.  Clinical TempEval , 2014, ArXiv.

[6]  P Rubinstein,et al.  Letter: Intra-HLA recombinations in juvenile diabetes mellitus. , 1976, Lancet.

[7]  Tommaso Caselli,et al.  SemEval-2010 Task 13: TempEval-2 , 2010, *SEMEVAL.

[8]  Jianxin Pan,et al.  Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events , 2016, Statistical methods in medical research.

[9]  Steven Bethard,et al.  Discovering Narrative Containers in Clinical Text , 2013 .

[10]  Guergana K. Savova,et al.  Discovering body site and severity modifiers in clinical texts , 2013, AMIA.

[11]  James Pustejovsky,et al.  SemEval-2015 Task 6: Clinical TempEval , 2015, *SEMEVAL.

[12]  P. Matthews,et al.  White matter lesion progression, brain atrophy, and cognitive decline: The Austrian stroke prevention study , 2005, Annals of neurology.

[13]  Carlo Strapparava,et al.  Proceedings of the 5th International Workshop on Semantic Evaluation , 2010 .

[14]  Nate Blaylock,et al.  A corpus of clinical narratives annotated with temporal information , 2012, IHI '12.

[15]  James Pustejovsky,et al.  Increasing Informativeness in Temporal Annotation , 2011, Linguistic Annotation Workshop.

[16]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[17]  Estela Saquete Boró,et al.  TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2 , 2010, *SEMEVAL.

[18]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[19]  Jun'ichi Tsujii,et al.  An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge , 2013, J. Am. Medical Informatics Assoc..

[20]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[21]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[22]  C Combi,et al.  Temporal reasoning and temporal data maintenance in medicine: Issues and challenges , 1997, Comput. Biol. Medicine.

[23]  James Pustejovsky,et al.  SemEval-2007 Task 15: TempEval Temporal Relation Identification , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[24]  Chen Lin,et al.  Descending-Path Convolution Kernel for Syntactic Structures , 2014, ACL.

[25]  Chen Lin,et al.  Automatic identification of methotrexate-induced liver toxicity in patients with rheumatoid arthritis from the electronic medical record , 2015, J. Am. Medical Informatics Assoc..

[26]  Alessandro Moschitti,et al.  A Study on Dependency Tree Kernels for Automatic Extraction of Protein-Protein Interaction , 2011, BioNLP@ACL.

[27]  Eric Fosler-Lussier,et al.  Cross-narrative Temporal Ordering of Medical Events , 2014, ACL.

[28]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[29]  George Hripcsak,et al.  Temporal reasoning with medical data - A review with emphasis on medical natural language processing , 2007, J. Biomed. Informatics.

[30]  Hulin Wu,et al.  Joint inference for nonlinear mixed-effects models and time to event at the presence of missing data. , 2007, Biostatistics.

[31]  Joel D. Martin,et al.  À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge , 2013, J. Am. Medical Informatics Assoc..

[32]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[33]  Lawrence M. Fagan,et al.  Extensions to the Time-Oriented Database Model to Support Temporal Reasoning in Medical Expert Systems , 1991, Methods of Information in Medicine.

[34]  E Miller The signed-rank (Wilcoxon)test. , 1969, Lancet.

[35]  M A Musen,et al.  A comparison of the temporal expressiveness of three database query methods. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[36]  Stephanie W. Haas,et al.  TN-TIES: A System for Extracting Temporal Information from Emergency Department Triage Notes , 2008, AMIA.

[37]  James F. Allen An Interval-Based Representation of Temporal Knowledge , 1981, IJCAI.

[38]  Chen Lin,et al.  Discovering Temporal Narrative Containers in Clinical Text , 2013, BioNLP@ACL.

[39]  Li Li,et al.  Research Paper: Syndromic Surveillance Using Ambulatory Electronic Health Records , 2009, J. Am. Medical Informatics Assoc..

[40]  James F. Allen,et al.  Temporal Evaluation , 2011, ACL.

[41]  Wayne H. Ward,et al.  Towards Temporal Relation Discovery from the Clinical Narrative , 2009, AMIA.

[42]  Robin Henderson,et al.  Joint modelling of repeated measurements and time-to-event outcomes: flexible model specification and exact likelihood inference , 2014, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[43]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..