论文信息 - LSI2_UNED at eHealth-KD Challenge 2019: A Few-shot Learning Model for Knowledge Discovery from eHealth Documents

LSI2_UNED at eHealth-KD Challenge 2019: A Few-shot Learning Model for Knowledge Discovery from eHealth Documents

In this work, we describe a Few-Shot Learning approach for Named Entity Recognition (NER) in eHealth documents to identify and classify key phrases in a document (subtask A in the IberLEF eHealthKD 2019 competition [10]). The architecture is an hybrid Bi-LSTM and CNN model with four input layers that can recognize multi-word entities using the BIO encoding format for the labels. The system obtained a F-score of 73.15% (baseline is 54,66%), with a 78,17% of precision, according to the eHealth-KD evaluation procedure. This improvement is reached mainly because (a) the correct selection of the hybrid model for NER that obtains better results using a POS tagger and (2) the addition of Wikidata entities to extend the vocabulary that improves the precision by nearly 10%.

Ana M. García-Serrano | Alicia Lara-Clares

[1] L. F. Rau,et al. Extracting company names from text , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[2] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[3] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[4] Sampo Pyysalo,et al. brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[5] Tianxi Cai,et al. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data , 2018, PSB.

[6] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7] Paloma Martínez,et al. A Hybrid Bi-LSTM-CRF model for Knowledge Recognition from eHealth documents , 2018, TASS@SEPLN.

[8] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Paloma Martínez,et al. Simplifying drug package leaflets written in Spanish by using word embedding , 2017, Journal of Biomedical Semantics.

[10] Hiroyuki Shindo,et al. Wikipedia2Vec: An Optimized Implementation for Learning Embeddings from Wikipedia , 2018 .

[11] Ana M. García-Serrano,et al. HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset , 2017, Inf. Syst..

[12] Rafael Muñoz,et al. Overview of the eHealth Knowledge Discovery Challenge at IberLEF 2019 , 2021, IberLEF@SEPLN.

[13] Andrey Kormilitzin,et al. Few-shot Learning for Named Entity Recognition in Medical Text , 2018, ArXiv.

[14] Ana M. García-Serrano,et al. Formal concept analysis for topic detection: A clustering quality experimental analysis , 2017, Inf. Syst..

[15] Juan Martínez-Romo,et al. Deep neural models for extracting entities and relationships in the new RDD corpus relating disabilities and rare diseases , 2018, Comput. Methods Programs Biomed..

[16] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[18] Ana M. García-Serrano,et al. Experiences at ImageCLEF 2010 using CBIR and TBIR Mixing Information Approaches , 2010, CLEF.