Categorization-driven cross-language retrieval of medical information

The Web has become a large repository of documents (or pages) written in many different languages. In this context, traditional information retrieval (IR) techniques cannot be used whenever the user query and the documents being retrieved are in different languages. To address this problem, new cross-language information retrieval (CLIR) techniques have been proposed. In this work, we describe a method for cross-language retrieval of medical information. This method combines query terms and related medical concepts obtained automatically through a categorization procedure. The medical concepts are used to create a linguistic abstraction that allows retrieval of information in a language-independent way, minimizing linguistic problems such as polysemy. To evaluate our method, we carried out experiments using the OHSUMED test collection, whose documents are written in English, with queries expressed in Portuguese, Spanish, and French. The results indicate that our cross-language retrieval method is as effective as a standard vector space model algorithm operating on queries and documents in the same language. Further, our results are better than previous results in the literature. © 2006 Wiley Periodicals, Inc.

[1]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[4]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[5]  James J. Cimino,et al.  Vocabulary and Health Care Information Technology: State of the Art , 1995 .

[6]  Mark W. Davis,et al.  Improving cross-language text retrieval with human interactions , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[7]  Yiyu Yao,et al.  On modeling information retrieval with probabilistic inference , 1995, TOIS.

[8]  Dominic Widdows,et al.  Unsupervised methods for developing taxonomies by combining syntactic and statistical information , 2003, NAACL.

[9]  Alberto H. F. Laender,et al.  An experimental study in auomatically categorizing medical documents , 2001 .

[10]  Yiyu Yao,et al.  A probabilistic inference model for information retrieval , 1991, Inf. Syst..

[11]  Douglas W. Oard,et al.  Alternative Approaches for Cross-Language Text Retrieval , 1997 .

[12]  Ophir Frieder,et al.  Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation , 2001, CIKM '01.

[13]  Marcello Federico,et al.  Statistical cross-language information retrieval using n-best query translations , 2002, SIGIR '02.

[14]  Paul Buitelaar,et al.  A Cross Language Document Retrieval System Based on Semantic Annotation , 2003, EACL.

[15]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[16]  Berthier A. Ribeiro-Neto,et al.  A hierarchical approach to the automatic categorization of medical documents , 1998, CIKM '98.

[17]  Douglas W. Oard,et al.  Improved Cross-Language Retrieval using Backoff Translation , 2001, HLT.

[18]  Yiming Yang,et al.  Translingual Information Retrieval: Learning from Bilingual Corpora , 1998, Artif. Intell..

[19]  Paul Buitelaar,et al.  Domain Specific Sense Disambiguation with Unsupervised Methods , 2004, LDV Forum.

[20]  William R. Hersh,et al.  SAPHIRE International: a tool for cross-language information retrieval , 1998, AMIA.

[21]  Susan T. Dumais,et al.  Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[22]  Padmini Srinivasan,et al.  Cross-language information retrieval with the UMLS metathesaurus , 1998, SIGIR '98.

[23]  Mark Sanderson,et al.  Improving cross language retrieval with triangulated translation , 2001, SIGIR '01.

[24]  W R Hersh,et al.  Applications of Technology: Clini Web: Managing Clinical Information on the World Wide Web , 1996, J. Am. Medical Informatics Assoc..

[25]  Gerard Salton,et al.  Automatic Processing of Foreign Language Documents , 1969, COLING.

[26]  P Zweigenbaum,et al.  A multi-lingual architecture for building a normalised conceptual representation from medical language. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[27]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[28]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[29]  Douglas W. Oard Evaluating Interactive Cross-Language Information Retrieval: Document Selection , 2000, CLEF.

[30]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.

[31]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[32]  Alberto H. F. Laender,et al.  An experimental study in automatically categorizing medical documents , 2001 .

[33]  Christian Lovis,et al.  Versatility of a multilingual and bi-directional approach for medical language processing , 1998, AMIA.