论文信息 - A cross-lingual framework for monolingual biomedical information retrieval

A cross-lingual framework for monolingual biomedical information retrieval

An important challenge for biomedical information retrieval (IR) is dealing with the complex, inconsistent and ambiguous biomedical terminology. Frequently, a concept-based representation defined in terms of a domain-specific terminological resource is employed to deal with this challenge. In this paper, we approach the incorporation of a concept-based representation in monolingual biomedical IR from a cross-lingual perspective. In the proposed framework, this is realized by translating and matching between text and concept-based representations. The approach allows for deployment of a rich set of techniques proposed and evaluated in traditional cross-lingual IR. We compare six translation models and measure their effectiveness in the biomedical domain. We demonstrate that the approach can result in significant improvements in retrieval effectiveness over word-based retrieval. Moreover, we demonstrate increased effectiveness of a CLIR framework for monolingual biomedical IR if basic translations models are combined.

[1] W. Bruce Croft,et al. Cross-lingual relevance models , 2002, SIGIR '02.

[2] William R. Hersh,et al. Research Paper: A Performance and Failure Analysis of SAPHIRE with a MEDLINE Test Collection , 1994, J. Am. Medical Informatics Assoc..

[3] Djoerd Hiemstra,et al. Parsimonious language models for information retrieval , 2004, SIGIR '04.

[4] Wessel Kraaij,et al. MeSH Up: effective MeSH text classification for improved document retrieval , 2009, Bioinform..

[5] Yi Li,et al. Exploring criteria for successful query expansion in the genomic domain , 2009, Information Retrieval.

[6] Wessel Kraaij,et al. Variations on language modeling for information retrieval , 2005, SIGF.

[7] Jianfeng Gao,et al. Statistical query translation models for cross-language information retrieval , 2006, TALIP.

[8] Jian-Yun Nie,et al. Adapting information retrieval to query contexts , 2008, Inf. Process. Manag..

[9] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[10] W. Bruce Croft,et al. Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[11] John D. Lafferty,et al. Information retrieval as statistical translation , 1999, SIGIR '99.

[12] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.

[13] Dolf Trieschnigg,et al. The influence of basic tokenization on biomedical document retrieval , 2007, SIGIR.

[14] Jong-Hyeok Lee,et al. Parsimonious translation models for information retrieval , 2007, Inf. Process. Manag..

[15] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16] Edward A. Fox,et al. Combination of Multiple Searches , 1993, TREC.

[17] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[18] Marti A. Hearst,et al. TREC 2007 Genomics Track Overview , 2007, TREC.

[19] Michael Krauthammer,et al. Term identification in the biomedical literature , 2004, J. Biomed. Informatics.

[20] P. Srinivasan. Retrieval feedback in MEDLINE. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[21] J. Jenkins,et al. Word association norms , 1964 .

[22] Mohand Boughanem,et al. Investigation on Disambiguation in CLIR: Aligned Corpus and Bi-directional Translation-Based Strategies , 2001, CLEF.

[23] Dolf Trieschnigg,et al. Proof of concept: concept-based biomedical information retrieval , 2011, SIGF.

[24] Hsin-Hsi Chen,et al. Integrating Query Translation and Document Translation in a Cross-language Information Retrieval System , 1998, AMTA.

[25] Martijn J. Schuemie,et al. Peregrine: Lightweight gene name normalization by dictionary lookup , 2007 .