论文信息 - Recognizing UMLS Semantic Types with Deep Learning - 字舞流文

Recognizing UMLS Semantic Types with Deep Learning

Entity recognition is a critical first step to a number of clinical NLP applications, such as entity linking and relation extraction. We present the first attempt to apply state-of-the-art entity recognition approaches on a newly released dataset, MedMentions. This dataset contains over 4000 biomedical abstracts, annotated for UMLS semantic types. In comparison to existing datasets, MedMentions contains a far greater number of entity types, and thus represents a more challenging but realistic scenario in a real-world setting. We explore a number of relevant dimensions, including the use of contextual versus non-contextual word embeddings, general versus domain-specific unsupervised pre-training, and different deep learning architectures. We contrast our results against the well-known i2b2 2010 entity recognition dataset, and propose a new method to combine general and domain-specific information. While producing a state-of-the-art result for the i2b2 2010 task (F1 = 0.90), our results on MedMentions are significantly lower (F1 = 0.63), suggesting there is still plenty of opportunity for improvement on this new data.

Berry de Bruijn | Kathleen C. Fraser | Isar Nejadgholi | Berry De Bruijn | Muqun Li | Astha LaPlante | Khaldoun Zine El Abidine | Muqun Li | I. Nejadgholi | B. Bruijn | A. LaPlante

[1] Joel D. Martin,et al. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 , 2011, J. Am. Medical Informatics Assoc..

[2] W. Chapman,et al. SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4] Donghui Li,et al. MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts , 2019, AKBC.

[5] Olivier Ferret,et al. Evaluation of a Sequence Tagging Tool for Biomedical Texts , 2018, Louhi@EMNLP.

[6] Ioannis Ch. Paschalidis,et al. Clinical Concept Extraction with Contextual Word Embedding , 2018, NIPS 2018.

[7] Sophia Ananiadou,et al. Improving the Extraction of Clinical Concepts from Clinical Records , 2014 .

[8] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[9] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[10] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[11] Suresh Manandhar,et al. SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[12] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[14] Massimo Piccardi,et al. Bidirectional LSTM-CRF for Clinical Concept Extraction , 2016, ClinicalNLP@COLING 2016.

[15] Nigel Collier,et al. Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[16] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[17] Massimo Piccardi,et al. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition , 2017, J. Biomed. Informatics.

[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[19] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[20] Maryam Habibi,et al. Deep learning with word embeddings improves biomedical named entity recognition , 2017, Bioinform..

[21] Kirk Roberts,et al. Assessing the Corpus Size vs. Similarity Trade-off for Word Embeddings in Clinical NLP , 2016, ClinicalNLP@COLING 2016.

[22] Jingqi Wang,et al. Enhancing Clinical Concept Extraction with Contextual Embedding , 2019, J. Am. Medical Informatics Assoc..

[23] Wei-Hung Weng,et al. Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[24] Shuying Shen,et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[25] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[26] Anna Rumshisky,et al. CliNER : A Lightweight Tool for Clinical Named Entity Recognition , 2015 .

[27] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30] Zhiyong Lu,et al. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.

[31] Inanç Birol,et al. In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition , 2018, Louhi@EMNLP.

[32] Busra Celikkaya,et al. Joint Entity Extraction and Assertion Detection for Clinical Text , 2018, ACL.

[33] Jin-Dong Kim,et al. The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[34] Siddhartha Jonnalagadda,et al. Enhancing clinical concept extraction with distributional semantics , 2012, J. Biomed. Informatics.