A cross-lingual framework for monolingual biomedical information retrieval

An important challenge for biomedical information retrieval (IR) is dealing with the complex, inconsistent and ambiguous biomedical terminology. Frequently, a concept-based representation defined in terms of a domain-specific terminological resource is employed to deal with this challenge. In this paper, we approach the incorporation of a concept-based representation in monolingual biomedical IR from a cross-lingual perspective. In the proposed framework, this is realized by translating and matching between text and concept-based representations. The approach allows for deployment of a rich set of techniques proposed and evaluated in traditional cross-lingual IR. We compare six translation models and measure their effectiveness in the biomedical domain. We demonstrate that the approach can result in significant improvements in retrieval effectiveness over word-based retrieval. Moreover, we demonstrate increased effectiveness of a CLIR framework for monolingual biomedical IR if basic translations models are combined.

[1]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[2]  William R. Hersh,et al.  Research Paper: A Performance and Failure Analysis of SAPHIRE with a MEDLINE Test Collection , 1994, J. Am. Medical Informatics Assoc..

[3]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[4]  Wessel Kraaij,et al.  MeSH Up: effective MeSH text classification for improved document retrieval , 2009, Bioinform..

[5]  Yi Li,et al.  Exploring criteria for successful query expansion in the genomic domain , 2009, Information Retrieval.

[6]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[7]  Jianfeng Gao,et al.  Statistical query translation models for cross-language information retrieval , 2006, TALIP.

[8]  Jian-Yun Nie,et al.  Adapting information retrieval to query contexts , 2008, Inf. Process. Manag..

[9]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[10]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[11]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[12]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[13]  Dolf Trieschnigg,et al.  The influence of basic tokenization on biomedical document retrieval , 2007, SIGIR.

[14]  Jong-Hyeok Lee,et al.  Parsimonious translation models for information retrieval , 2007, Inf. Process. Manag..

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[17]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[19]  Michael Krauthammer,et al.  Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[20]  P. Srinivasan Retrieval feedback in MEDLINE. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[21]  J. Jenkins,et al.  Word association norms , 1964 .

[22]  Mohand Boughanem,et al.  Investigation on Disambiguation in CLIR: Aligned Corpus and Bi-directional Translation-Based Strategies , 2001, CLEF.

[23]  Dolf Trieschnigg,et al.  Proof of concept: concept-based biomedical information retrieval , 2011, SIGF.

[24]  Hsin-Hsi Chen,et al.  Integrating Query Translation and Document Translation in a Cross-language Information Retrieval System , 1998, AMTA.

[25]  Martijn J. Schuemie,et al.  Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .