ARBOREx, a system for abbreviation recognition and resolution in biomedical texts based on a curated set of lexical resources, is presented. For ShortForm-ExtendedForm (SF-EF) explicit relations found in the text a ranked dynamic regular expression approach is used, complemented by access to Google Translation Services when needed. On the other hand, for EF resolution when a SF candidate has been identified in the source text, a knowledge-based approach is adopted, implemented as a set of simplified patterns that compile to regular expressions applied over the text stream. These patterns perform EF selection exploiting linguistic form (word co-occurrence in discourse) as a means to map SF to term meaning. A very preliminary version of a conceptual classification of SF-EF pairs is used in some of the patterns. Besides, a simple bag-ofwords model that computes the overlapping of the content words in the EF to those in the text is also used as a fallback strategy. The system outlined has been used in the BARR2 track, described elsewhere [1]. On the BARR2 datasets, ARBOREx achieved high precision and recall for both tasks, with an F1 score of 88.42 for task 1 and 83.17 for task 2.
[1]
Montserrat Marimon,et al.
Finding Mentions of Abbreviations and Their Definitions in Spanish Clinical Cases: The BARR2 Shared Task Evaluation Results
,
2018,
IberEval@SEPLN.
[2]
W. John Wilbur,et al.
Finding abbreviations in biomedical literature: three BioC-compatible modules and four BioC-formatted corpora
,
2014,
Database J. Biol. Databases Curation.
[3]
Miguel Pignatelli,et al.
Database: The Journal of Biological Databases and Curation
,
2016
.
[4]
Antonio Jimeno-Yepes,et al.
A Knowledge-Based Approach to Medical Records Retrieval
,
2011,
TREC.