LaSTUS-TALN at IberLEF 2019 eHealth-KD Challenge: Deep Learning Approaches to Information Extraction in Biomedical Texts

This paper presents the participation of the LASTUS-TALN team in the IberLEF eHealth-KD 2019 challenge, which proposes 2 subtasks in the context of biomedical text processing in Spanish: i) the detection and classification of key phrases and ii) the identification of the semantic relationships between them. We propose an architecture based on a bidirectional long short-term memory (BiLSTM) with a conditional random field (CRF) classifier as the last layer of the network to find and classify the relevant key phrases. Concerning relation extraction problem, for each candidate relationship, we describe a global and local context representing the supposed relationship and the context of the candidate key phrases, respectively and divided the problem into three simpler classification tasks: i) decide if the entities are related, ii) identify the type of relationship and iii) obtain the correct direction. In our model, these three classification tasks were trained at the same time. When key phrase extraction and relation extraction were run in sequence, our system achieved the third highest F1 score in the main evaluation.

[1]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[2]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[3]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[4]  Miguel Ángel García Cumbreras,et al.  Overview of TASS 2018: Opinions, Health and Emotions , 2018, TASS@SEPLN.

[5]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[6]  Yijia Liu,et al.  Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation , 2018, CoNLL.

[7]  Rafael Muñoz,et al.  Overview of the eHealth Knowledge Discovery Challenge at IberLEF 2019 , 2021, IberLEF@SEPLN.

[8]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[9]  Jakub Piskorski,et al.  Information Extraction: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[10]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[13]  Núria Queralt-Rosinach,et al.  Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research , 2014, BMC Bioinformatics.

[14]  G Savova,et al.  Capturing the Patient’s Perspective: a Review of Advances in Natural Language Processing of Health-Related Text , 2017, Yearbook of Medical Informatics.

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.