Overview of MESINESP, a Spanish Medical Semantic Indexing Task within BioASQ 2020

In this paper, we present an overview of the novel MESINESP Task on medical semantic indexing in Spanish within the eighth edition of the BioASQ challenge, which ran as a lab in the Conference and Labs of the Evaluation Forum (CLEF) 2020. BioASQ is a series of challenges aiming at the promotion of systems and methodologies for large-scale biomedical semantic indexing and question answering. MESINESP represents the first attempt to generate resources for the development and evaluation semantic indexing strategies specialized on health-related content in Spanish. We have generate several publicly accessible Gold Standard collections of manually indexed content covering medical literature, clinical trials and health project descriptions associated to controlled terminologies in the form of the hierarchical DeCS vocabulary. Manual indexing of MESINESP documents was carried out by professional medical literature indexers. They used an indexing web interface particularly adapted for this task. The results obtained by participating teams was promising, showing that training data of semantically indexed medical literature can also serve to implement automatic indexing systems that assist manual indexing of other types of documents like clinical trials. MESINESP corpus: https://zenodo.org/record/3746596.Xo9WTIzaFA

[1]  ChengXiang Zhai,et al.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing , 2016, Bioinform..

[2]  Felipe Soares,et al.  Medical Word Embeddings for Spanish: Development and Evaluation , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[3]  Mario Almagro,et al.  ICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification Problem , 2020, IEEE Access.

[4]  S. Balakrishnan,et al.  Coronavirus Disease 2019 (COVID-19): A Systematic Review of Imaging Findings in 919 Patients. , 2020, AJR. American journal of roentgenology.

[5]  Ravi Philip Rajkumar,et al.  COVID-19 and mental health: A review of the existing literature , 2020, Asian Journal of Psychiatry.

[6]  I. Dhillon,et al.  X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers , 2019 .

[7]  Manish Bansal,et al.  Cardiovascular disease and COVID-19 , 2020, Diabetes & Metabolic Syndrome: Clinical Research & Reviews.

[8]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[11]  K. Bretonnel Cohen,et al.  Findings of the WMT 2019 Biomedical Translation Shared Task: Evaluation for MEDLINE Abstracts and Biomedical Terminologies , 2019, WMT.

[12]  Luis M. de Campos,et al.  CoLe and UTAI at BioASQ 2015: Experiments with Similarity Based Descriptor Assignment , 2015, CLEF.

[13]  Martin Krallinger,et al.  BioASQ at CLEF2020: Large-Scale Biomedical Semantic Indexing and Question Answering , 2020, ECIR.

[14]  Georgios Paliouras,et al.  Evaluation measures for hierarchical classification: a unified view and novel approaches , 2013, Data Mining and Knowledge Discovery.

[15]  Dean Giustini,et al.  Google Scholar is not enough to be used alone for systematic reviews , 2013, Online journal of public health informatics.

[16]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[17]  Francisco M. Couto,et al.  MER: a shell script and annotation server for minimal named entity recognition and linking , 2018, Journal of Cheminformatics.

[18]  Martin Krallinger,et al.  BSC Participation in the WMT Translation of Biomedical Abstracts , 2019, WMT.

[19]  Zihan Zhang,et al.  AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification , 2019, NeurIPS.