Improving the accessibility of biomedical texts by semantic enrichment and definition expansion

We present work aimed at facilitating the comprehensibility of healthrelated English-Spanish parallel texts by means of the semantic annotation of biomedical concepts and the automatic expansion of their definitions. In order to overcome the limitations posed by the scarcity of resources available for Spanish, we propose to exploit existing tools targeted at English and then transfer the produced annotations. The evaluations performed show the feasibility of this approach. An enriched set of texts is made available, which can be retrieved, visualized and downloaded through a web interface.

[1]  Dietrich Rebholz-Schuhmann,et al.  A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC , 2015, J. Am. Medical Informatics Assoc..

[2]  José Carlos Cortizo,et al.  Building a Spanish MMTx by Using Automatic Translation and Biomedical Ontologies , 2008, IDEAL.

[3]  Montse Cuadros,et al.  Biomedical term normalization of EHRs with UMLS , 2018, LREC.

[4]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[5]  Koldo Gojenola,et al.  Automatic Annotation of Medical Records in Spanish with Disease, Drug and Substance Names , 2013, CIARP.

[6]  Paloma Martínez,et al.  Automatic identification of biomedical concepts in spanish-language unstructured clinical texts , 2010, IHI.

[7]  Jelena Jovanovic,et al.  Semantic annotation in biomedicine: the current landscape , 2017, Journal of Biomedical Semantics.

[8]  Dietrich Rebholz-Schuhmann,et al.  Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview , 2013, CLEF.

[9]  Anthony N. Nguyen,et al.  Evaluation of Medical Concept Annotation Systems on Clinical Records , 2016, ALTA.

[10]  Rafael Berlanga Llavori,et al.  Semantic annotation of biomedical texts through concept retrieval , 2010, Proces. del Leng. Natural.

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Mariana L. Neves,et al.  The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine , 2016, LREC.

[13]  Yan Cheng,et al.  The impact of inadequate health literacy on patient satisfaction, healthcare utilization, and expenditures among older adults , 2017, Geriatric nursing.

[14]  Giuseppe Attardi,et al.  Machine Translation for Entity Recognition across Languages in Biomedical Documents , 2013, CLEF.

[15]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[16]  Peter Szolovits,et al.  Multilingual Named-Entity Recognition from Parallel Corpora , 2013, CLEF.

[17]  Nigel Collier,et al.  Using silver and semi-gold standard corpora to compare open named entity recognisers , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[18]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..