Multiple features for clinical relation extraction: A machine learning approach

Relation extraction aims to discover relational facts about entity mentions from plain texts. In this work, we focus on clinical relation extraction; namely, given a medical record with mentions of drugs and their attributes, we identify relations between these entities. We propose a machine learning model with a novel set of knowledge-based and BioSentVec embedding features. We systematically investigate the impact of these features with standard distance- and word-based features, conducting experiments on two benchmark datasets of clinical texts from MADE 2018 and n2c2 2018 shared tasks. For comparison with the feature-based model, we utilize state-of-the-art models and three BERT-based models, including BioBERT and Clinical BERT. Our results demonstrate that distance and word features provide significant benefits to the classifier. Knowledge-based features improve classification results only for particular types of relations. The sentence embedding feature provides the largest improvement in results, among other explored features on the MADE corpus. The classifier obtains state-of-the-art performance in clinical relation extraction with F-measure of 92.6%, improving F-measure by 3.5% on the MADE corpus.

[1]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[2]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[3]  Fei Li,et al.  An investigation of single-domain and multidomain medication and adverse drug event relation extraction from electronic health record notes using advanced deep learning models , 2019, J. Am. Medical Informatics Assoc..

[4]  Hong Yu,et al.  Methods for Linking EHR Notes to Education Materials , 2015, Information Retrieval Journal.

[5]  Angus Roberts,et al.  Mining clinical relationships from patient narratives , 2008, BMC Bioinformatics.

[6]  M. Devarakonda,et al.  Adverse Drug Events Detection in Clinical Notes by Jointly Modeling Entities and Relations Using Neural Networks , 2019, Drug Safety.

[7]  Zhiyong Lu,et al.  Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..

[8]  M L Pao,et al.  Factors affecting students' use of MEDLINE. , 1993, Computers and biomedical research, an international journal.

[9]  Ming Yang,et al.  Bidirectional Long Short-Term Memory Networks for Relation Classification , 2015, PACLIC.

[10]  Michele Filannino,et al.  2018 N2c2 Shared Task on Adverse Drug Events and Medication Extraction in Electronic Health Records , 2020, J. Am. Medical Informatics Assoc..

[11]  Elena Tutubalina,et al.  A Comparative Study on Feature Selection in Relation Extraction from Electronic Health Records , 2019, DAMDID/RCDL.

[12]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[17]  Steven Bethard,et al.  UArizona at the MADE1.0 NLP Challenge , 2018, Medication and Adverse Drug Event Detection.

[18]  Qingyu Chen,et al.  BioWordVec, improving biomedical word embeddings with subword information and MeSH , 2019, Scientific Data.

[19]  Sunil Kumar Sahu,et al.  Relation extraction from clinical texts using domain invariant convolutional neural network , 2016, BioNLP@ACL.

[20]  Hong Yu,et al.  Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0) , 2019, Drug Safety.

[21]  Burkhard Rost,et al.  LocText: relation extraction of protein localizations to assist database curation , 2018, BMC Bioinformatics.

[22]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[23]  Jinfeng Yang,et al.  Clinical Relation Extraction with Deep Learning , 2016 .

[24]  Yifan Peng,et al.  BioSentVec: creating sentence embeddings for biomedical texts , 2018, 2019 IEEE International Conference on Healthcare Informatics (ICHI).

[25]  Patrick R. Alba,et al.  Detecting Adverse Drug Events with Rapidly Trained Classification Models , 2019, Drug Safety.