Overview of MedProcNER Task on Medical Procedure Detection and Entity Linking at BioASQ 2023

Recent advances in NLP techniques, the use of large language models and Transformers are showing promising results for processing clinical content. The development of tools for automatic recognition of medical concepts, variables, and clinical expressions is key for the semantic analysis of clinical records, semantic search engines and the generation of structured data representations. Despite the importance of medical procedures for management, diagnosis prevention and prognosis, there are few comprehensive resources for medical procedure extraction and normalization. In order to foster the development of procedure mention detection and entity linking systems, we have released the MedProcNER (Medical Procedures Name Entity Recognition) corpus, a high quality, manually annotated collection of 1000 clinical case reports written in Spanish. The corpus has been exhaustively labeled by physicians following detailed annotation guidelines and quality control measurements. Additionally, a multilingual Silver Standard corpus has also been generated for English, Italian, French, Portuguese, Romanian, Dutch, Swedish and Czech, to provide a clinical NLP resource for research in these languages. A total of 9 teams from 8 different countries have participated in the MedProcNER track of BioASQ 2023 (part of CLEF 2023), using mostly Transformers architectures and models like RoBERTA, BioMBERT, ALBERT, Longformers or SapBERT. MedProcNER was structured into three sub-tracks: a) Clinical Procedure Entity Recognition task, b) Clinical Procedure Normalization task to SNOMED CT and c) Clinical Procedure-based Document Indexing task. The MedProcNER corpus, guidelines, and resources (including cross-mappings to MeSH and ICD-10) are freely available at: https://zenodo

[1]  Natalia V. Loukachevitch,et al.  NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities , 2022, Bioinformatics.

[2]  G. Paliouras,et al.  Overview of BioASQ 2022: The Tenth BioASQ Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2022, CLEF.

[3]  Martin Krallinger,et al.  NLP applied to occupational health: MEDDOPROF shared task at IberLEF 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts , 2021, Proces. del Leng. Natural.

[4]  Georgios Paliouras,et al.  Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering , 2021, CLEF.

[5]  Nigel Collier,et al.  Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking , 2021, ACL.

[6]  Antonio Moreno-Sandoval,et al.  A clinical trials corpus annotated with UMLS entities to enhance the access to evidence-based medicine , 2021, BMC Medical Informatics and Decision Making.

[7]  Zaiqiao Meng,et al.  Self-Alignment Pretraining for Biomedical Entity Representations , 2020, NAACL.

[8]  Jaewoo Kang,et al.  Biomedical Entity Representations with Synonym Marginalization , 2020, ACL.

[9]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[10]  Sadid A. Hasan,et al.  SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks , 2020, Journal of Biomedical Semantics.

[11]  Montserrat Marimon,et al.  PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track , 2019, EMNLP.

[12]  Xavier Tannier,et al.  Terminologies augmented recurrent neural network model for clinical named entity recognition , 2019, J. Biomed. Informatics.

[13]  Parth Pathak,et al.  Annotation of a Large Clinical Entity Corpus , 2018, EMNLP.

[14]  Thierry Hamon,et al.  A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT) , 2018, Lang. Resour. Evaluation.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[17]  Dorothy D. Stuck,et al.  Roberta , 1997 .

[18]  P L Schuyler,et al.  The UMLS Metathesaurus: representing different views of biomedical concepts. , 1993, Bulletin of the Medical Library Association.

[19]  S. Boytcheva,et al.  Fusion @ BioASQ MedProcNER: Transformer-based Approach for Procedure Recognition and Linking in Spanish Clinical Text , 2023, CLEF.

[20]  M. Martín-Valdivia,et al.  Coming a Long Way with Pre-Trained Transformers and String Matching Techniques: Clinical Procedure Mention Recognition and Normalization , 2023, CLEF.

[21]  S. Boytcheva,et al.  Leveraging Biomedical Ontologies for Clinical Procedures Recognition in Spanish at BioASQ MedProcNER , 2023, CLEF.

[22]  Roshan Poudel,et al.  BIT.UA at MedProcNer: Discovering Medical Procedures in Spanish Using Transformer Models with MCRF and Augmentation , 2023, CLEF.

[23]  Aitor García-Pablos,et al.  VICOMTECH at MedProcNER 2023: Transformers-based Sequence-labelling and Cross-encoding for Entity Detection and Normalisation in Spanish Clinical Texts , 2023, CLEF.

[24]  G. Paliouras,et al.  Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources , 2022, CLEF.

[25]  Martin Krallinger,et al.  Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources , 2022, Proces. del Leng. Natural.

[26]  Georgios Paliouras,et al.  Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials , 2021, CLEF.

[27]  Vijay K. Shanker,et al.  BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA , 2021, BIONLP.

[28]  Martin Krallinger,et al.  Named Entity Recognition, Concept Normalization and Clinical Coding: Overview of the Cantemist Track for Cancer Text Mining in Spanish, Corpus, Guidelines, Methods and Results , 2020, IberLEF@SEPLN.