论文信息 - ARBOREx: Abbreviation Resolution Based on Regular Expressions for BARR2

ARBOREx: Abbreviation Resolution Based on Regular Expressions for BARR2

ARBOREx, a system for abbreviation recognition and resolution in biomedical texts based on a curated set of lexical resources, is presented. For ShortForm-ExtendedForm (SF-EF) explicit relations found in the text a ranked dynamic regular expression approach is used, complemented by access to Google Translation Services when needed. On the other hand, for EF resolution when a SF candidate has been identified in the source text, a knowledge-based approach is adopted, implemented as a set of simplified patterns that compile to regular expressions applied over the text stream. These patterns perform EF selection exploiting linguistic form (word co-occurrence in discourse) as a means to map SF to term meaning. A very preliminary version of a conceptual classification of SF-EF pairs is used in some of the patterns. Besides, a simple bag-ofwords model that computes the overlapping of the content words in the EF to those in the text is also used as a fallback strategy. The system outlined has been used in the BARR2 track, described elsewhere [1]. On the BARR2 datasets, ARBOREx achieved high precision and recall for both tasks, with an F1 score of 88.42 for task 1 and 83.17 for task 2.

Fernando Sánchez León

[1] Montserrat Marimon,et al. Finding Mentions of Abbreviations and Their Definitions in Spanish Clinical Cases: The BARR2 Shared Task Evaluation Results , 2018, IberEval@SEPLN.

[2] W. John Wilbur,et al. Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora , 2014, Database J. Biol. Databases Curation.

[3] Miguel Pignatelli,et al. Database: The Journal of Biological Databases and Curation , 2016 .

[4] Antonio Jimeno-Yepes,et al. A Knowledge-Based Approach to Medical Records Retrieval , 2011, TREC.