论文信息 - Extraction of Medical Concepts from Italian Natural Language Descriptions (Discussion Paper)

Extraction of Medical Concepts from Italian Natural Language Descriptions (Discussion Paper)

In this paper we present a Natural Language Processing (NLP) pipeline to automatically extract medical concepts from a free text written in a language other than English. To do so, we use common NLP techniques and the metathesaurus of Unified Medical Language System (UMLS). Specifically, our goal is to automatically extract ontological concepts representing which part of the human body is injured and what is the nature of the injury, given an Italian textual description of a work accident. We start by partitioning the text into tokens and assigning to each token its part-of-speech, and then use an appropriate tool to extract relevant concepts to be searched within UMLS. We tested our system on a public large repository containing textual descriptions of work accidents produced by INAIL. Experimental results confirm that our system is able to correctly extract relevant medical concepts from texts written in Italian.

[1] Hongfang Liu,et al. Journal of Biomedical Informatics , 2022 .

[2] Sunghwan Sohn,et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[3] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[4] Giovanni Moretti,et al. Digging in the Dirt: Extracting Keyphrases from Texts with KD , 2015 .

[5] Sylvie Ratté,et al. Comparison of MetaMap and cTAKES for entity extraction in clinical notes , 2018, BMC Medical Informatics and Decision Making.

[6] Olivier Bodenreider,et al. The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[7] Alan R. Aronson,et al. An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[8] Michael E. Miller,et al. Electronic Health Records , 2014, Annals of Internal Medicine.

[9] Parisa Rashidi,et al. Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[10] Giovanni Moretti,et al. Tint 2.0: an All-inclusive Suite for NLP in Italian , 2018, CLiC-it.